I am trying to construct a lucene text index for the jena TDB that I already have built. From the documentation: http://jena.apache.org/documentation/query/text-query.html#building-a-text-index it seems I have to load the TDB using tdbloader(my TDB is quite large so I decided to use tdbloader2) the I have to run textindexer using an assembler file. Right now I am having trouble loading the TDB that needs to be indexed.
I am doing this from terminal, text indexer does not take in a directory as far as I know?: java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --desc=assembler_file On Fri, Aug 9, 2013 at 12:30 PM, Rob Vesse <[email protected]> wrote: > Brad > > I still don't understand what you are trying to do here? > > If you already have a TDB dataset then you do not need to run tdbloader2 > > If you need to pass a TDB dataset to another Jena utility such as Fuseki > or the text indexer that expects a TDB location then you should simply > pass in the directory path to the directory where the TDB dataset is > located. > > Rob > > > > On 8/9/13 9:21 AM, "Brad Moran" <[email protected]> wrote: > > >I tried doing this with all files in my TDB: > > > >jena-2.10.2/apache-jena-2.10.2-SNAPSHOT/bin/tdbloader2 --loc > >NetBeansProjects/mdr-older/trunk/tdb > >NetBeansProjects/mdr-older/trunk/tdb/GOSP.dat > >NetBeansProjects/mdr-older/trunk/tdb/GOSP.idn > >NetBeansProjects/mdr-older/trunk/tdb/GPOS.dat > >NetBeansProjects/mdr-older/trunk/tdb/GPOS.idn > >NetBeansProjects/mdr-older/trunk/tdb/GSPO.dat > >NetBeansProjects/mdr-older/trunk/tdb/GSPO.idn > >NetBeansProjects/mdr-older/trunk/tdb/journal.jrnl > >NetBeansProjects/mdr-older/trunk/tdb/node2id.dat > >NetBeansProjects/mdr-older/trunk/tdb/node2id.idn > >NetBeansProjects/mdr-older/trunk/tdb/nodes.dat > >NetBeansProjects/mdr-older/trunk/tdb/nodes.dat-jrnl > >NetBeansProjects/mdr-older/trunk/tdb/OSP.dat > >NetBeansProjects/mdr-older/trunk/tdb/OSP.idn > >NetBeansProjects/mdr-older/trunk/tdb/OSPG.dat > >NetBeansProjects/mdr-older/trunk/tdb/OSPG.idn > >NetBeansProjects/mdr-older/trunk/tdb/POS.dat > >NetBeansProjects/mdr-older/trunk/tdb/POS.idn > >NetBeansProjects/mdr-older/trunk/tdb/POSG.dat > >NetBeansProjects/mdr-older/trunk/tdb/POSG.idn > >NetBeansProjects/mdr-older/trunk/tdb/prefix2id.dat > >NetBeansProjects/mdr-older/trunk/tdb/prefix2id.idn > >NetBeansProjects/mdr-older/trunk/tdb/prefixes.dat > >NetBeansProjects/mdr-older/trunk/tdb/prefixes.dat-jrnl > >NetBeansProjects/mdr-older/trunk/tdb/prefixIdx.dat > >NetBeansProjects/mdr-older/trunk/tdb/prefixIdx.idn > >NetBeansProjects/mdr-older/trunk/tdb/SPO.dat > >NetBeansProjects/mdr-older/trunk/tdb/SPO.idn > >NetBeansProjects/mdr-older/trunk/tdb/SPOG.dat > >NetBeansProjects/mdr-older/trunk/tdb/SPOG.idn > > > >Then get: > > > >11:58:38 -- TDB Bulk Loader Start > > 11:58:38 Data phase > >INFO Load: NetBeansProjects/mdr-older/trunk/tdb/GOSP.dat -- 2013/08/09 > >11:58:40 EDT > >Exception in thread "main" org.apache.jena.atlas.AtlasException: > >java.nio.charset.MalformedInputException: Input length = 1 > >at org.apache.jena.atlas.io.IO.exception(IO.java:206) > >at > >org.apache.jena.atlas.io.CharStreamBuffered$SourceReader.fill(CharStreamBu > >ffered.java:79) > >at > >org.apache.jena.atlas.io.CharStreamBuffered.fillArray(CharStreamBuffered.j > >ava:156) > >at > >org.apache.jena.atlas.io.CharStreamBuffered.advance(CharStreamBuffered.jav > >a:139) > >at org.apache.jena.atlas.io.PeekReader.advanceAndSet(PeekReader.java:251) > >at org.apache.jena.atlas.io.PeekReader.init(PeekReader.java:244) > >at org.apache.jena.atlas.io.PeekReader.peekChar(PeekReader.java:169) > >at org.apache.jena.atlas.io.PeekReader.makeUTF8(PeekReader.java:108) > >at > >org.apache.jena.riot.tokens.TokenizerFactory.makeTokenizerUTF8(TokenizerFa > >ctory.java:41) > >at org.apache.jena.riot.RiotReader.createParser(RiotReader.java:130) > >at org.apache.jena.riot.RiotReader.parse(RiotReader.java:115) > >at org.apache.jena.riot.RiotReader.parse(RiotReader.java:93) > >at org.apache.jena.riot.RiotReader.parse(RiotReader.java:66) > >at > >com.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.exec(CmdNodeTabl > >eBuilder.java:163) > >at arq.cmdline.CmdMain.mainMethod(CmdMain.java:101) > >at arq.cmdline.CmdMain.mainRun(CmdMain.java:63) > >at arq.cmdline.CmdMain.mainRun(CmdMain.java:50) > >at > >com.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.main(CmdNodeTabl > >eBuilder.java:81) > >Caused by: java.nio.charset.MalformedInputException: Input length = 1 > >at java.nio.charset.CoderResult.throwException(CoderResult.java:277) > >at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:338) > >at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177) > >at java.io.InputStreamReader.read(InputStreamReader.java:184) > >at java.io.Reader.read(Reader.java:140) > >... 17 more > > > > > > > >Is it possible that I only need certain files from my TDB directory? I am > >pretty sure my TDB is not malformed, because I have run a lot of queries > >on > >it successfully. > > > > > > > >On Thu, Aug 8, 2013 at 5:34 PM, Andy Seaborne <[email protected]> wrote: > > > >> On 08/08/13 20:33, Brad Moran wrote: > >> > >>> I am trying to use tdbloader2 on mac os 10.8.4 from command line. I > >>>have > >>> all my triples successfully loaded into a tdb stored in directory > >>>"tdb." I > >>> am sure it is successfully loaded because I can run any query on it > >>> successfully. So I try: > >>> > >>> apache-jena-2.10.1/bin/**tdbloader2 > >>> --loc=NetBeansProjects/**mdrolder/trunk/luceneIndexes > >>> NetBeansProjects/mdrolder/**trunk/tdb > >>> > >> > >> > >> The command format is > >> > >> tdbloader2 --loc LOC DATAFILE DATAFILE2 DATAFILE3 .... > >> > >> The data comes from miles, not a directory. > >> > >> It does not take a directory and find all the files. You have to > >> enumerate the files. > >> > >> Andy > >> > >> > >> > >>> And I get Exception in thread "main" org.apache.jena.atlas.** > >>> AtlasException: > >>> java.io.FileNotFoundException: NetBeansProjects/mdrolder/**trunk/tdb > >>>(Is > >>> a > >>> directory). > >>> Which makes sense, I just do not understand how I am supposed to load > >>>the > >>> tdb though. Should I just use one of the files within the TDB? > >>> > >>> Thanks, > >>> Brad > >>> > >>> > >> > >
