I reached a similar error with jena-2.10.1 but with a different character when parsing a more recent version of freebase-rdf-2013-08-04-00-00.
WARN [line: 4632165, col: 55] Bad IRI: < http://croctail.corpwatch.org/#cw_506630,cw_{key}> Code: 4/UNWISE_CHARACTER in FRAGMENT: The character matches no grammar rules of URIs/IRIs. These characters are permitted in RDF URI References, XML system identifiers, and XML Schema anyURIs. ERROR [line: 4643044, col: 35] Unknown char: $(36;0x0024) Yuhan Zhang Senior Software Engineer OneScreen Inc. www.onescreen.com (949) 525-4825 Ext: 177 [email protected] <[email protected]> On Tue, Jan 8, 2013 at 4:24 AM, Andy Seaborne <[email protected]> wrote: > On 08/01/13 11:49, Rob Vesse wrote: > >> 2.10.0 is the current development snapshot, you can get this via maven by >> setting the version for your Jena dependencies to 2.10.0-SNAPSHOT >> >> >> If you need to download the JARs (I.e. non-maven builds) you can find them >> on the Apache artifactory at >> https://repository.apache.org/**index.html#nexus-search;quick~**jena<https://repository.apache.org/index.html#nexus-search;quick~jena> >> >> You need to click on Show All Versions for the module you want in order to >> see download links for snapshots >> >> Rob >> > > And the download is available at: > > https://repository.apache.org/**content/repositories/** > snapshots/org/apache/jena/**apache-jena/<https://repository.apache.org/content/repositories/snapshots/org/apache/jena/apache-jena/> > > (cough - see message of 28/Dec in this thread) > > Andy > > > >> On 1/8/13 11:45 AM, "Abhishek Shivkumar" <[email protected]> >> wrote: >> >> 1. I am using the correct version of rdf file that you have. >>> 2. This error of unknown char (\92) is appearing in all the files at >>> different line numbers. I am not sure what this unknown char \(92) is. >>> Tried to look in the surrounding of the line number in the file contents >>> but can't find it :( >>> 3. I can only find version 2.7.4 at >>> http://www.apache.org/dist/**jena/binaries/<http://www.apache.org/dist/jena/binaries/>. >>> May be THIS is the reason. Do >>> you know where I can download the 2.10.0 version? >>> >>> Thanks much! >>> >>> Thank you! >>> >>> With Regards, >>> Abhishek S >>> >>> >>> On Tue, Jan 8, 2013 at 5:26 AM, Andy Seaborne <[email protected]> wrote: >>> >>> On 08/01/13 11:00, Abhishek Shivkumar wrote: >>>> >>>> Hi Andy, >>>>> >>>>> I am using the script to correct the errors. When I run the script >>>>> dwim >>>>> on all the part files, it shows error messages, and continues >>>>> processing. >>>>> Are these errors that are corrected, or still existing that need >>>>> attention? >>>>> Sample error message is: >>>>> >>>>> ERROR [line:25335, col:25] Unknown char: \(92) >>>>> >>>>> >>>> What's on the lines around there? >>>> And if you've split the dump, which file? >>>> >>>> That needs correcting in the source. I can pare the first 30k lines of >>>> the file with Jena with no fixups. >>>> >>>> Maybe you don't have exactly the version of Freebase that I did >>>> freebase-rdf-2012-12-23-00-00.****gz. There is no suspect forms around >>>> line 25K of my copy. >>>> >>>> ns:award.award_winner ns:type.type.instance ns:m.03cpgmq. >>>> ns:award.award_winner ns:type.type.instance ns:m.05x3tbk. <---25335 >>>> ns:award.award_winner ns:type.type.instance ns:m.05q_rp. >>>> >>>> You also need the latest version of Jena (recent 2.10.0 SNAPSHOT). >>>> >>>> >>>> >>>> Just wanted to know if we can ignore these messages while running the >>>>> dwim >>>>> script. >>>>> >>>>> >>>> You can ignore WARN. ERRORs usually stop the parser as they indicate >>>> structural problems. >>>> >>>> Andy >>>> >>>> >>>> Thank you! >>>>> >>>>> With Regards, >>>>> Abhishek S >>>>> >>>>> >>>>> On Sat, Dec 29, 2012 at 1:58 PM, Andy Seaborne <[email protected]> >>>>> wrote: >>>>> >>>>> If you want to parse the Freebase dump, try this: >>>>> >>>>>> >>>>>> >>>>>> http://people.apache.org/~******andy/Freebase20121223/Notes.******txt<http://people.apache.org/~****andy/Freebase20121223/Notes.****txt> >>>>>> <http: >>>>>> //people.apache.org/%7E**andy/**Freebase20121223/Notes.**txt<http://people.apache.org/%7E**andy/Freebase20121223/Notes.**txt> >>>>>> > >>>>>> >>>>>> <http://people.apache.org/%****7Eandy/Freebase20121223/Notes.****txt >>>>>> <http:/ >>>>>> /people.apache.org/%7Eandy/**Freebase20121223/Notes.txt<http://people.apache.org/%7Eandy/Freebase20121223/Notes.txt> >>>>>> > >>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> It takes about 90 minutes on my home desktop machine to fix and parse >>>>>> the >>>>>> data. >>>>>> >>>>>> To load it, get a very large machine - it has been reported [1] that a >>>>>> previous dump has been loaded into TDB. >>>>>> >>>>>> Andy >>>>>> >>>>>> [1] >>>>>> http://lists.freebase.com/******pipermail/freebase-discuss/**<http://lists.freebase.com/****pipermail/freebase-discuss/**> >>>>>> <**http://list <http://list> >>>>>> s.freebase.com/**pipermail/**freebase-discuss/**<http://s.freebase.com/**pipermail/freebase-discuss/**> >>>>>> > >>>>>> 2012-December/010169.html<**http**://lists.freebase.com/** >>>>>> >>>>>> pipermail/freebase-discuss/****2012-December/010169.html<http** >>>>>> ://lists.fre <http://lists.fre> >>>>>> ebase.com/pipermail/freebase-**discuss/2012-December/010169.**html<http://ebase.com/pipermail/freebase-discuss/2012-December/010169.html> >>>>>> > >>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >> > -- The information contained in this e-mail is for the exclusive use of the intended recipient(s) and may be confidential, proprietary, and/or legally privileged. Inadvertent disclosure of this message does not constitute a waiver of any privilege. If you receive this message in error, please do not directly or indirectly print, copy, retransmit, disseminate, or otherwise use the information. In addition, please delete this e-mail and all copies and notify the sender.
