Auto Indexing in Solr

2013-07-25 Thread archit2112
Hi Im using Solr 4's Data Import Utility to index Oracle 10g XE database. Im
using full imports as well as delta imports. I want these processes to be
automatic. (Eg: The import processes can be timed or should be executed as
soon any data in the database is modified). I searched for the same online
and I heard people talk about CRON and scripts. However, Im not able to
figure out how to implement it. Can you please provide a tutorial like
explanation? Thanks in advance




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-Indexing-in-Solr-tp4080233.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Auto Indexing in Solr

2013-07-25 Thread archit2112
I have to execute this command for full-import

http://localhost:8983/solr/dataimport?command=full-import

Can you explain how do i use the java timer to fire this HTTP request.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-Indexing-in-Solr-tp4080233p4080278.html
Sent from the Solr - User mailing list archive at Nabble.com.


Timestamp compatibility while performing delta import in solr

2013-07-24 Thread archit2112
Im new to solr.I have successfully indexed oracle 10g xe database. Im trying
to perform delta import on the same.
The Delta query required a comparison of last_modified column of the table
with ${dih.last_index_time}.
However in my application I do not have such a column . Also, i cannot add
this column. Therefore i used 'scn_to_timestamp(ora_rowscn)' to give the
value of the required timestamps. This query returns the value of type
timestamp in the following format 24-JUL-13 12.42.32.0 PM and
dih.last_index_time is in the format 2013-07-24 12:18:03. So, I changed the
format of dih.last_index_time as to_timestamp('${dih.last_index_time}',
'/MM/DD HH:MI:SS').

My Data-config looks like this - 

dataConfig
dataSource type=JdbcDataSource driver=oracle.jdbc.OracleDriver
url=jdbc:oracle:thin:@160.110.13.49:1521:xe user=system
password=manager / 
document name=product_info
entity name=PRODUCT pk=PID query=SELECT * FROM PRODUCT
deltaImportQuery=SELECT * FROM PRODUCT WHERE PID=${dih.delta.id}
deltaQuery=SELECT PID FROM PRODUCT WHERE scn_to_timestamp(ora_rowscn) 
to_timestamp('${dih.last_index_time}', '/MM/DD HH:MI:SS')
  field column=PID name=id / 
  field column=PNAME name=itemName / 
  field column=INITQTY name=itemQuantity /
  field column=REMQTY name=remQuantity /
  field column=PRICE name=itemPrice / 
  field column=SPECIFICATION name=specifications / 
  entity name=SUB_CATEGORY query=SELECT * FROM SUB_CATEGORY WHERE
SCID=${PRODUCT.SCID}
field column=SUBCATNAME name=brand / 
entity name=CATEGORY query=SELECT CNAME FROM CATEGORY WHERE
CID=${SUB_CATEGORY.CID}
  field column=CNAME name=itemCategory / 
/entity
  /entity
/entity
/document
/dataConfig

However,This is not working and im getting the following error - 
Unable to execute query: SELECT * FROM PRODUCT WHERE PID= Processing
Document # 1
Caused by: java.sql.SQLException: ORA-00936: missing expression

Please help me out!!!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Timestamp-compatibility-while-performing-delta-import-in-solr-tp4079982.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexing Oracle Database in Solr using Data Import Handler

2013-07-23 Thread archit2112
Im trying to Index oracle database 10g XE using Solr's Data Import Handler.

My data-config.xml looks like this

dataConfig
dataSource type=JdbcDataSource driver=oracle.jdbc.OracleDriver
url=jdbc:oracle:thin:@XXX.XXX.XXX.XXX::xe user=XX
password=XX / 
document name=product_info
entity name=product query=select * from product
field column=pid name=id /  
field column=pname name=itemName / 
field column=initqty name=itemQuantity /
field column=remQty name=remQuantity /
field column=price name=itemPrice / 
field column=specification name=specifications / 
/entity
/document
/dataConfig

My schema.xml looks like this -

field name=id type=text_general indexed=true stored=true
required=true multiValued=false / 
   field name=itemName type=text_general indexed=true stored=true
multiValued=true omitNorms=true termVectors=true /
   field name=itemQuantity type=text_general indexed=true
stored=true multiValued=true omitNorms=true termVectors=true /   
   field name=remQuantity type=text_general indexed=true
stored=true multiValued=true omitNorms=true termVectors=true /   
   field name=itemPrice type=text_general indexed=true stored=true
multiValued=true omitNorms=true termVectors=true /   
   field name=specifications type=text_general indexed=true
stored=true multiValued=true omitNorms=true termVectors=true /   
   field name=brand type=text_general indexed=true stored=true
multiValued=true omitNorms=true termVectors=true /   
   field name=itemCategory type=text_general indexed=true
stored=true multiValued=true omitNorms=true termVectors=true / 

Now when I try to index it, Solr is not able to read the columns of the
table and therefore indexing fails. it says that the document is missing the
unique key id which ,as you can see, is clearly present in document. Also,
generally in the log when such an exception is thrown it is clearly shown
that what all fields were picked up by the document. However in this case,
No fields are being read.

But if i change my query then everything works perfectly. The modified
data-config.xml -

dataConfig
dataSource name=db1 type=JdbcDataSource
driver=oracle.jdbc.OracleDriver
url=jdbc:oracle:thin:@XXX.XXX.XX.XX::xe user=
password=X / 
document name=product_info
entity name=products dataSource=db1 query=select pid as id,pname as
itemName,initqty as itemQuantity, remqty as remQuantity, price as itemPrice,
specification as specifications from product
field column=id name=id / 
field column=itemName name=itemName / 
field column=itemQuantity name=itemQuantity /
field column=remQuantity name=remQuantity /
field column=itemPrice name=itemPrice / 
field column=specifications name=specifications /
/entity
/document
/dataConfig

Why is this happening? how do i solve it? how does giving an alias affect
indexing process? Thanks in advance




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-Oracle-Database-in-Solr-using-Data-Import-Handler-tp4079649.html
Sent from the Solr - User mailing list archive at Nabble.com.


Index mysql database using data import handler in solr

2013-07-11 Thread archit2112
I want to index mysql database in solr using the Data Import Handler.

I have made two tables. The first table holds the metadata of a file.

create table filemetadata (
id varchar(20) primary key ,
filename varchar(50),
path varchar(200),
size varchar(10),
author varchar(50)
) ;

+---+-+-+--+-+
| id   | filename   | path  | size   | author   | 
+---+-+-+--+-+
| 1| abc.txt| c:\files   | 2kb   | eric   | 
+---+-+-+--+-+
| 2| xyz.docx  | c:\files   | 5kb   | john  | 
+---+-+-+--+-+
| 3| pqr.txt|c:\files| 10kb  | mike  | 
+---+-+-+--+-+

The second table contains the favourite info about a particular file in
the above table.

create table filefav (
fid varchar(20) primary key ,
id varchar(20),
favouritedby varchar(300),
favouritedtime varchar(10),
FOREIGN KEY (id) REFERENCES filemetadata(id) 
) ;

++--+-++
| fid| id  | favouritedby  | favouritedtime   | 
++--+-++
| 1 | 1   | ross | 22:30   | 
++--+-++
| 2 | 1   | josh | 12:56   | 
++--+-++
| 3 | 2   | johny   | 03:03   | 
++--+-++
| 4 | 2   | sean | 03:45  | 
++--+-++

here id' is a foreign key. The second table is showing which person has
marked which document as his/her favourite. Eg the file abc.txt represented
by id = 1 has been marked favourite (see column favouritedby) by ross and
josh.


I want to index the the files as follows:

each document should have the following fields

id   - to be taken from the first table filemetadata
filename - to be taken from the first table filemetadata
path - to be taken from the first table filemetadata
size - to be taken from the first table filemetadata
author   - to be taken from the first table filemetadata
Favouritedby - this field should contain the names of all the people
from table 2 filefav (from the favouritedby column) who like that particular
file.

eg after indexing doc 1 should have

id = 1
filename = abc.txt
path = c:\files
size = 2kb
author = eric
favourited by - ross , josh 

How Do I achieve this? 

I have written a data-config.xml (which is not giving the desired result) as
follows

dataConfig
dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost:3306/test user=root password=root / 
document name=filemetadata

entity name=restaurant query=select * from filemetadata
field column=id name=id / 

 entity name=filefav query=select favouritedby from filefav where
id=${filemetadata.id}
field column=favouritedby name=favouritedby1 /
/entity

field column=filename name=name1 / 
field column=path name=path1 / 
field column=size name=size1 / 
field column=author name=author1 /  

/entity
/document
/dataConfig

Can anyone explain how do i achieve this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-mysql-database-using-data-import-handler-in-solr-tp4077205.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexing database in Solr using Data Import Handler

2013-07-10 Thread archit2112

Im trying to index MySql database using Data Import Handler in solr.

I have made two tables. The first table holds the metadata of a file.

create table filemetadata (
id varchar(20) primary key ,
filename varchar(50),
path varchar(200),
size varchar(10),
author varchar(50)
) ;

The second table contains the favourite info about a particular file in
the above table.

create table filefav (
fid varchar(20) primary key ,
id varchar(20),
favouritedby varchar(300),
favouritedtime varchar(10),
FOREIGN KEY (id) REFERENCES filemetadata(id) 
) ;

As you can see id is a foreign key.

To index this i have written the following data-config.xml -

dataConfig
dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost:3306/test user=root password=root / 
document name=filemetadata

entity name=restaurant query=select * from filemetadata
field column=id name=id / 

 entity name=filefav query=select favouritedby from filefav where id=
'${filemetadata.id}'
field column=favouritedby name=favouritedby1 /
/entity

field column=filename name=name1 / 
field column=path name=path1 / 
field column=size name=size1 / 
field column=author name=author1 /  

/entity

/document
/dataConfig

Everything is working but the favouritedby1 field is not getting indexed ,
ie, that field does not exist when i run the *:* query. Can you please help
me out?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-database-in-Solr-using-Data-Import-Handler-tp4077180.html
Sent from the Solr - User mailing list archive at Nabble.com.


Extract file name (without extension) while indexing using Data Import Handler in Solr

2013-07-03 Thread archit2112
Im successfully able to index pdf,doc,ppt,etc files using the Data Import
Handler in solr 4.3.0 .

My data-config.xml looks like this -

dataConfig
dataSource name=bin type=BinFileDataSource /
document
entity name=f dataSource=null rootEntity=false
processor=FileListEntityProcessor
baseDir=C:\Users\aroraarc\Desktop\Impdo 
   
fileName=.*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt)|(pptx)|(xls)|(xlsx)|(txt)
onError=skip
recursive=true

field column=fileAbsolutePath name=path /
field column=fileSize name=size /
field column=fileLastModified name=lastmodified /
field column=file name=fileName/

 entity name=tika-test dataSource=bin
processor=TikaEntityProcessor
url=${f.fileAbsolutePath} format=text onError=skip
field column=Author name=author meta=true/
field column=title name=title meta=true/
field column=text name=content/

  /entity
/entity
/document
/dataConfig

However in the fileName field i want to insert the pure file name without
the extension. Eg - Instead of 'HelloWorld.txt' I want only 'HelloWorld' to
be inserted in the fileName field. How do I achieve this?

Thanks in advance!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Extract-file-name-without-extension-while-indexing-using-Data-Import-Handler-in-Solr-tp4074991.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unique key error while indexing pdf files

2013-07-02 Thread archit2112
Can you please suggest a way (with example) of assigning this unique key to a
pdf file?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314p4074588.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unique key error while indexing pdf files

2013-07-02 Thread archit2112
Okay. Can you please suggest a way (with an example) of assigning this unique
key to a pdf file. Say, a unique number to each pdf file. How do i achieve
this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314p4074592.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unique key error while indexing pdf files

2013-07-02 Thread archit2112
Yes. The absolute path is unique.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314p4074620.html
Sent from the Solr - User mailing list archive at Nabble.com.


Removal of unique key - Query Elevation Component

2013-07-02 Thread archit2112

I want to index pdf files in solr 4.3.0 using the data import handler.

I have done the following:

My request handler -

requestHandler name=/dataimport 
class=org.apache.solr.handler.dataimport.DataImportHandler  
lst name=defaults  
  str name=configdata-config.xml/str  
/lst  
  /requestHandler  

My data-config.xml

dataConfig  
dataSource type=BinFileDataSource /  
document  
entity name=f dataSource=null rootEntity=false 
processor=FileListEntityProcessor 
baseDir=C:\Users\aroraarc\Desktop\Impdo fileName=.*pdf 
recursive=true  
entity name=tika-test processor=TikaEntityProcessor 
url=${f.fileAbsolutePath} format=text  
field column=Author name=author meta=true/
field column=title name=title meta=true/
field column=text name=text/
/entity  
/entity  
/document  
/dataConfig  

Now when i tried to index the documents i got the following error

org.apache.solr.common.SolrException: Document is missing mandatory
uniqueKey field: id

Because i dont want any uniquekey in my case i disabled it as follows :

In solrconfig.xml i commented out -

searchComponent name=elevator class=solr.QueryElevationComponent 
pick a fieldType to analyze queries 
str name=queryFieldTypestring/str
str name=config-fileelevate.xml/str
  /searchComponent 

In schema.xml i commented out uniquekeyid/uniquekey

and added

fieldType name=uuid class=solr.UUIDField indexed=true / 
field name=id type=uuid indexed=true stored=true default=NEW /

and in elevate.xml i made the following changes

elevate
 query text=foo bar
  doc id=4602376f-9741-407b-896e-645ec3ead457 /
 /query
/elevate 

When i do this the indexing takes place but the indexed docs contain an
author,s_author and id field. The document should contain author,text,title
and id field (as defined in my data-config.xml). Please help me out. Am i
doing anything wrong? and from where did this s_author field come?

doc
str name=authorarora arc/str
str name=author_sarora arc/str
str name=id4f65332d-49d9-497a-b88b-881da618f571/str/doc





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Removal-of-unique-key-Query-Elevation-Component-tp4074624.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Removal of unique key - Query Elevation Component

2013-07-02 Thread archit2112
Thanks! The author_s issue has been resolved. 
Why are the other fields not getting indexed ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Removal-of-unique-key-Query-Elevation-Component-tp4074624p4074636.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unique key error while indexing pdf files

2013-07-02 Thread archit2112
Yes. The absolute path is unique. How do i implement it? can you please
explain?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314p4074638.html
Sent from the Solr - User mailing list archive at Nabble.com.


Index pdf files.

2013-07-01 Thread archit2112
Hi I'm new to Solr. I want to index pdf files usng the Data Import Handler.
Im using Solr-4.3.0. I followed the steps given in this post

http://lucene.472066.n3.nabble.com/indexing-with-DIH-and-with-problems-td3731129.html

However, I get the following error -

Full Import failed:java.lang.NoClassDefFoundError:
org/apache/tika/parser/Parser

Please help!

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-pdf-files-tp4074278.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Index pdf files.

2013-07-01 Thread archit2112
Hi 

Thanks a lot. I did what you said. Now I'm getting the following error.

Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
java.util.regex.PatternSyntaxException: Dangling meta character '*' near
index 0



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-pdf-files-tp4074278p4074297.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Index pdf files.

2013-07-01 Thread archit2112
I figured it out. It was a problem with the regular expression i used in
data-config.xml .




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-pdf-files-tp4074278p4074304.html
Sent from the Solr - User mailing list archive at Nabble.com.


Unique key error while indexing pdf files

2013-07-01 Thread archit2112
Hi

Im trying to index pdf files in solr 4.3.0 using the data import handler. 

*My request handler - *

requestHandler name=/dataimport1 
class=org.apache.solr.handler.dataimport.DataImportHandler 
lst name=defaults 
  str name=configdata-config1.xml/str 
/lst 
  /requestHandler 

*My data-config1.xml *

dataConfig 
dataSource type=BinFileDataSource / 
document 
entity name=f dataSource=null rootEntity=false 
processor=FileListEntityProcessor 
baseDir=C:\Users\aroraarc\Desktop\Impdo fileName=.*pdf 
recursive=true 
entity name=tika-test processor=TikaEntityProcessor 
url=${f.fileAbsolutePath} format=text 
field column=Author name=author meta=true/
field column=title name=title1 meta=true/
field column=text name=text/
/entity 
/entity 
/document 
/dataConfig 


Now When i try and index the files i get the following error -

org.apache.solr.common.SolrException: Document is missing mandatory
uniqueKey field: id
at
org.apache.solr.update.AddUpdateCommand.getIndexedId(AddUpdateCommand.java:88)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:517)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:396)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at 
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:70)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:235)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:500)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)


This problem can be solved easily in case of database indexing but i dont
know how to go about the unique key of a document. how do i define the id
field (unique key) of a pdf file. how do i solve this problem?

Thanks in advance




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unique key error while indexing pdf files

2013-07-01 Thread archit2112
Im new to solr. Im just trying to understand and explore various features
offered by solr and their implementations. I would be very grateful if you
could solve my problem with any example of your choice. I just want to learn
how i can index pdf documents using data import handler.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314p4074327.html
Sent from the Solr - User mailing list archive at Nabble.com.