On 27/08/13 00:21, Yuhan Zhang wrote:
thanks for the suggestion, Andy. I'll convert the turtle file and retry it.

It seems their encoding of any odd characters is $xxxx, I'm not sure where that came from. After the report from the 20121223 I thought they'd clean this up - maybe a new one was get through. It's worth a report to the freebase list.

        Andy


Yuhan Zhang
Senior Software Engineer
OneScreen Inc.
www.onescreen.com
(949) 525-4825 Ext: 177
[email protected] <[email protected]>


On Mon, Aug 26, 2013 at 1:29 PM, Andy Seaborne <[email protected]> wrote:

On 26/08/13 18:41, Yuhan Zhang wrote:

Hi Andy,

the line 4643044 looks like this:

ns:m.04ln83j key:user.robert.world$0027s_**tallest.building "preceded_by"


Raw $ is not allowed in a prefixed name.

But I guess $0027 is intended (which is ') so use %27.

- - - - - -

You can use \$ (in which case the URI will have a real $ in it) but some
tools may have problems, or use %24 (in which case the URI will have the 3
chars %-2-4 in it)

[172s]  PN_LOCAL_ESC    ::=     '\' ('_' | '~' | '.' | '-' | '!' | '$' |
'&' | "'" | '(' | ')' | '*' | '+' | ',' | ';' | '=' | '/' | '?' | '#' | '@'
| '%')

http://www.w3.org/TR/turtle/#**sec-grammar-grammar<http://www.w3.org/TR/turtle/#sec-grammar-grammar>

         Andy



Yuhan Zhang
Senior Software Engineer
OneScreen Inc.
www.onescreen.com
(949) 525-4825 Ext: 177
[email protected] <[email protected]>


On Sat, Aug 24, 2013 at 2:50 AM, Andy Seaborne <[email protected]> wrote:

  On 24/08/13 03:29, Yuhan Zhang wrote:

  I reached a similar error with  jena-2.10.1 but with a different
character
when parsing a more recent version of freebase-rdf-2013-08-04-00-00.


Yes - they keep changing things and don't check much.


   WARN  [line: 4632165, col: 55] Bad IRI: <

http://croctail.corpwatch.org/****#cw_506630,cw_{key}<http://croctail.corpwatch.org/**#cw_506630,cw_%7Bkey%7D>
<http://**croctail.corpwatch.org/#cw_**506630,cw_%7Bkey%7D<http://croctail.corpwatch.org/#cw_506630,cw_%7Bkey%7D>


Code: 4/UNWISE_CHARACTER
in FRAGMENT: The character matches no grammar rules of URIs/IRIs. These
characters are permitted in RDF URI References, XML system identifiers,
and
XML Schema anyURIs.


Only a warning - no '{' or '}' in IRIs


   ERROR [line: 4643044, col: 35] Unknown char: $(36;0x0024)



What's that line?  $ is illegal in some places, but legal in others.

          Andy


  Yuhan Zhang
Senior Software Engineer
OneScreen Inc.
www.onescreen.com
(949) 525-4825 Ext: 177
[email protected] <[email protected]>



On Tue, Jan 8, 2013 at 4:24 AM, Andy Seaborne <[email protected]> wrote:

   On 08/01/13 11:49, Rob Vesse wrote:


   2.10.0 is the current development snapshot, you can get this via
maven

by
setting the version for your Jena dependencies to 2.10.0-SNAPSHOT


If you need to download the JARs (I.e. non-maven builds) you can find
them
on the Apache artifactory at
https://repository.apache.org/******index.html#nexus-search;**<https://repository.apache.org/****index.html#nexus-search;**>
quick~**jena<https://**repository.apache.org/**index.**
html#nexus-search;quick~**jena<https://repository.apache.org/**index.html#nexus-search;quick~**jena>
**>
<https://**repository.apache.**org/index.**html#nexus-search;**
quick~jena<http://repository.apache.org/index.**html#nexus-search;quick~jena>
<https://repository.**apache.org/index.html#nexus-**search;quick~jena<https://repository.apache.org/index.html#nexus-search;quick~jena>





You need to click on Show All Versions for the module you want in
order
to
see download links for snapshots

Rob


  And the download is available at:

https://repository.apache.org/******content/repositories/**<https://repository.apache.org/****content/repositories/**>
<ht**tps://repository.apache.org/****content/repositories/**<https://repository.apache.org/**content/repositories/**>

snapshots/org/apache/jena/******apache-jena/<https://**
repository.apache.org/content/****repositories/snapshots/org/****<http://repository.apache.org/content/**repositories/snapshots/org/**>
apache/jena/apache-jena/<https**://repository.apache.org/**
content/repositories/**snapshots/org/apache/jena/**apache-jena/<https://repository.apache.org/content/repositories/snapshots/org/apache/jena/apache-jena/>





(cough - see message of 28/Dec in this thread)

           Andy



   On 1/8/13 11:45 AM, "Abhishek Shivkumar" <[email protected]>

wrote:

    1. I am using the correct version of rdf file that you have.

  2. This error of unknown char (\92) is appearing in all the files at
different line numbers. I am not sure what this unknown char \(92)
is.
Tried to look in the surrounding of the line number in the file
contents
but can't find it :(
3. I can only find version 2.7.4 at
http://www.apache.org/dist/******jena/binaries/<http://www.apache.org/dist/****jena/binaries/>
<http://www.**apache.org/dist/**jena/**binaries/<http://www.apache.org/dist/**jena/binaries/>

<http://www.**apache.org/dist/**jena/binaries/<http://apache.org/dist/jena/binaries/>
<http://www.**apache.org/dist/jena/binaries/<http://www.apache.org/dist/jena/binaries/>
**>
**>.


May be THIS is the reason. Do
you know where I can download the 2.10.0 version?

Thanks much!

Thank you!

With Regards,
Abhishek S


On Tue, Jan 8, 2013 at 5:26 AM, Andy Seaborne <[email protected]>
wrote:

    On 08/01/13 11:00, Abhishek Shivkumar wrote:


    Hi Andy,


        I am using the script to correct the errors. When I run the
script
dwim
on all the part files, it shows error messages, and continues
processing.
Are these errors that are corrected, or still existing that need
attention?
Sample error message is:

ERROR [line:25335, col:25] Unknown char: \(92)


   What's on the lines around there?

And if you've split the dump, which file?

That needs correcting in the source.  I can pare the first 30k lines
of
the file with Jena with no fixups.

Maybe you don't have exactly the version of Freebase that I did
freebase-rdf-2012-12-23-00-00.********gz.  There is no suspect
forms

around

line 25K of my copy.

ns:award.award_winner   ns:type.type.instance   ns:m.03cpgmq.
ns:award.award_winner   ns:type.type.instance   ns:m.05x3tbk.
<---25335
ns:award.award_winner   ns:type.type.instance   ns:m.05q_rp.

You also need the latest version of Jena (recent 2.10.0 SNAPSHOT).



    Just wanted to know if we can ignore these messages while running
the

  dwim
script.


   You can ignore WARN.  ERRORs usually stop the parser as they

indicate
structural problems.

            Andy


    Thank you!


With Regards,
Abhishek S


On Sat, Dec 29, 2012 at 1:58 PM, Andy Seaborne <[email protected]>
wrote:

     If you want to parse the Freebase dump, try this:



http://people.apache.org/~**********andy/Freebase20121223/**
Notes.**<http://people.apache.org/~********andy/Freebase20121223/Notes.**>
******txt<http://people.**apache.org/~******andy/**
Freebase20121223/Notes.********txt<http://people.apache.org/~******andy/Freebase20121223/Notes.******txt>

<http://people.**apache.org/~******andy/**<http://apache.org/~****andy/**>
Freebase20121223/Notes.******txt<http://people.apache.org/~**
****andy/Freebase20121223/**Notes.****txt<http://people.apache.org/~****andy/Freebase20121223/Notes.****txt>

**>
<http:
//people.apache.org/%7E**andy/******Freebase20121223/Notes.****
txt<http://people.apache.org/%7E**andy/****Freebase20121223/Notes.**txt>
<http://people.apache.org/%**7E**andy/**Freebase20121223/**
Notes.**txt<http://people.apache.org/%7E**andy/**Freebase20121223/Notes.**txt>

**<http://people.apache.org/%**7E****andy/Freebase20121223/**
Notes.***<http://people.apache.org/%7E****andy/Freebase20121223/Notes.***>
*txt<http://people.apache.org/**%7E**andy/Freebase20121223/**
Notes.**txt<http://people.apache.org/%7E**andy/Freebase20121223/Notes.**txt>






  <http://people.apache.org/%********7Eandy/Freebase20121223/**
Notes.****txt
<http:/
/people.apache.org/%7Eandy/******Freebase20121223/Notes.txt<http://people.apache.org/%7Eandy/****Freebase20121223/Notes.txt>
<ht**tp://people.apache.org/%**7Eandy/**Freebase20121223/**
Notes.txt<http://people.apache.org/%7Eandy/**Freebase20121223/Notes.txt>

<htt**p://people.apache.org/%**7Eandy/**Freebase20121223/**
Notes.txt<http://people.apache.org/%7Eandy/**Freebase20121223/Notes.txt>
<http://people.**apache.org/%7Eandy/**Freebase20121223/Notes.txt<http://people.apache.org/%7Eandy/Freebase20121223/Notes.txt>









  It takes about 90 minutes on my home desktop machine to fix and
parse
the
data.

To load it, get a very large machine - it has been reported [1]
that a
previous dump has been loaded into TDB.

             Andy

[1]
http://lists.freebase.com/**********pipermail/freebase-**
discuss/****<http://lists.freebase.com/********pipermail/freebase-discuss/****>
<http://lists.**freebase.com/******pipermail/**
freebase-discuss/**<http://lists.freebase.com/******pipermail/freebase-discuss/**>

<http://lists.freebase.com/********pipermail/freebase-discuss/**
****<http://lists.freebase.com/******pipermail/freebase-discuss/****>
<http://lists.freebase.**com/****pipermail/freebase-**discuss/**<http://lists.freebase.com/****pipermail/freebase-discuss/**>



  <**http://list <http://list>
s.freebase.com/**pipermail/******freebase-discuss/**<http://s.freebase.com/**pipermail/****freebase-discuss/**>
<http://s.**freebase.com/**pipermail/****freebase-discuss/**<http://s.freebase.com/**pipermail/**freebase-discuss/**>

<http://s.**freebase.com/****pipermail/**freebase-discuss/****<http://freebase.com/**pipermail/**freebase-discuss/**>
<http://s.freebase.com/****pipermail/freebase-discuss/**<http://s.freebase.com/**pipermail/freebase-discuss/**>





   2012-December/010169.html<******http**://lists.freebase.com/**


pipermail/freebase-discuss/********2012-December/010169.html<***
*http**
://lists.fre <http://lists.fre>
ebase.com/pipermail/freebase-******discuss/2012-December/**
010169.**<http://ebase.com/pipermail/freebase-****discuss/2012-December/010169.**>
**html<http://ebase.com/**pipermail/freebase-**discuss/**
2012-December/010169.**html<http://ebase.com/pipermail/freebase-**discuss/2012-December/010169.**html>

<http://ebase.com/**pipermail/**freebase-discuss/**<http://ebase.com/**pipermail/freebase-discuss/**>
2012-December/010169.html<http**://ebase.com/pipermail/**
freebase-discuss/2012-**December/010169.html<http://ebase.com/pipermail/freebase-discuss/2012-December/010169.html>






















Reply via email to