Solr Schema and how?

2011-10-04 Thread caman
Hello all,

We have a screen builder application where users design their own forms.
They have a choice of create forms fields with type date, text,numbers,large
text etc upto total of 500 fields supported on a screen. 
Once screens are designed system automatically handle the type checking for
valid data entries on front end even though data of any type gets stored as
text. 
So as you can imagine, table is huge with 600+
columns(screenId,recordId,field1 ...field500) and every column is set as
'text'. Same table stores data for every screen designed in the system.

So basically here are my questions

1. How best to index it? I did it using dynamic field 'field*' which works
great
2. Since everything is text,not sure how to enable filtering on each field
e.g. If a user wants to enable 'greater than' or 'less then' type of queries
on a number field (stored as text), somehow that data needs to be stored as
number in SOLR but I don't think I have a way to do that.  I can't do that
Since 'field2' may be be a 'number' field for a 'screen1' and 'date' for
screen2. 




Would appreciate any ideas to handle this? 



thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Schema-and-how-tp3393989p3393989.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how can i develop client application with solr url using javascript?

2011-08-22 Thread caman
search 'ajax-solr' on google.  To handle solr url, look at establishing a
proxy
Good luck.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-can-i-develop-client-application-with-solr-url-using-javascript-tp3275506p3276269.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.0 = Spatial Search - How to

2011-01-13 Thread caman

Thanks
Here was the issues. Concatenating 2 floats(lat,lng) at mysql end converted
it to a BLOB. Indexing would fail in storing BLOB in 'location' type field.
After BLOB issue was resolved, all worked ok.

Thank you all for your help



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2253691.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 4.0 = Spatial Search - How to

2011-01-12 Thread caman

Ok, this could be very easy to do but was not able to do this.
Need to enable location search i.e. if someone searches for location 'New
York' = show results for New York and results within 50 miles of New York.
We do have latitude/longitude stored in database for each record but not
sure how to index these values to enable spatial search.
Any help would be much appreciated.

thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2245592.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.0 = Spatial Search - How to

2011-01-12 Thread caman

Adam,

thanks. Yes that helps
but how does coords fields get populated? All I have is 

field name=lat type=tdouble indexed=true stored=true /
field name=lng type=tdouble indexed=true stored=true /

field name=coord type=location indexed=true stored=true /

fields 'lat' and  'lng' get populated by dataimporthandler but coord, am not
sure?

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2245709.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: DIH and denormalizing

2010-06-28 Thread caman

In your query 'query=SELECT webtable as wt FROM ncdat_wt WHERE 
featurecode='${ncdat.feature}'  .. instead of ${ncdat.feature} use
${dataTable.feature}  where dataTable is your parent entity name.

 

 

 

From: Shawn Heisey-4 [via Lucene]
[mailto:ml-node+929151-1527242139-124...@n3.nabble.com] 
Sent: Monday, June 28, 2010 2:24 PM
To: caman
Subject: DIH and denormalizing

 

I am trying to do some denormalizing with DIH from a MySQL source.   
Here's part of my data-config.xml: 

entity name=dataTable pk=did 
   query=SELECT *,FROM_UNIXTIME(post_date) as pd FROM ncdat WHERE 
did  ${dataimporter.request.minDid} AND did = 
${dataimporter.request.maxDid} AND (did % 
${dataimporter.request.numShards}) IN (${dataimporter.request.modVal}) 
entity name=ncdat_wt 
 query=SELECT webtable as wt FROM ncdat_wt WHERE 
featurecode='${ncdat.feature}' 
/entity 
/entity 

The relationship between features in ncdat and webtable in ncdat_wt (via 
featurecode) will be many-many.  The wt field in schema.xml is set up 
as multivalued. 

It seems that ${ncdat.feature} is not being set.  I saw a query 
happening on the server and it was SELECT webtable as wt FROM ncdat_wt 
WHERE featurecode='' - that last part is an empty string with single 
quotes around it.  From what I can tell, there are no entries in ncdat 
where feature is blank.  I've tried this with both a 1.5-dev checked out 
months ago (which we are using in production) and a 3.1-dev checked out 
today. 

Am I doing something wrong? 

Thanks, 
Shawn 




  _  

View message @
http://lucene.472066.n3.nabble.com/DIH-and-denormalizing-tp929151p929151.htm
l 
To start a new topic under Solr - User, email
ml-node+472068-464289649-124...@n3.nabble.com 
To unsubscribe from Solr - User, click
 (link removed) 
GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx  here. 

 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-and-denormalizing-tp929151p929168.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Can solr return pretty text as the content?

2010-06-23 Thread caman

Define Pretty text.

 

1)Are you talking about XML/JSON returned by SOLR is not pretty ?

If yes, try indent=on with your query params

 

2)Or talking about data in certain field? 

Solr returns what you feed it. Look at your filters for that field
type. Your filters/tokenizer may be stripping the formatting.

 

 

 

From: JohnRodey [via Lucene]
[mailto:ml-node+917912-920852633-124...@n3.nabble.com] 
Sent: Wednesday, June 23, 2010 1:19 PM
To: caman
Subject: Can solr return pretty text as the content?

 

When I feed pretty text into solr for indexing from lucene and search for
it, the content is always returned as one long line of text.  Is there a way
for solr to return the pretty formatted text to me? 

  _  

View message @
http://lucene.472066.n3.nabble.com/Can-solr-return-pretty-text-as-the-conten
t-tp917912p917912.html 
To start a new topic under Solr - User, email
ml-node+472068-464289649-124...@n3.nabble.com 
To unsubscribe from Solr - User, click
 (link removed) 
GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx  here. 

 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-solr-return-pretty-text-as-the-content-tp917912p917966.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Stemmed and/or unStemmed field

2010-06-23 Thread caman

Ahh,perfect.

Will take a look. thanks

 

From: Robert Muir [via Lucene]
[mailto:ml-node+918302-232685105-124...@n3.nabble.com] 
Sent: Wednesday, June 23, 2010 4:17 PM
To: caman
Subject: Re: Stemmed and/or unStemmed field

 

On Wed, Jun 23, 2010 at 3:58 PM, Vishal A. 
[hidden email]wrote: 

 
 Here is what I am trying to do :  Someone clicks on  'Comforters 
Pillows' 
 , we would want the results to be filtered where title has keyword 
 'Comforter' or  'Pillows' but we have been getting results with word 
 'comfort' in the title. I assume it is because of stemming. What is the 
 right way to handle this? 
 

from your examples, it seems a more lightweight stemmer might be an easy 
option: https://issues.apache.org/jira/browse/LUCENE-2503

-- 
Robert Muir 
[hidden email] 



  _  

View message @
http://lucene.472066.n3.nabble.com/Stemmed-and-or-unStemmed-field-tp917876p9
18302.html 
To start a new topic under Solr - User, email
ml-node+472068-464289649-124...@n3.nabble.com 
To unsubscribe from Solr - User, click
 (link removed) 
GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx  here. 

 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Stemmed-and-or-unStemmed-field-tp917876p918309.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: JSON formatted response from SOLR question....

2010-05-10 Thread caman

Take a look at AjaxSolr source code:

 

http://github.com/evolvingweb/ajax-solr

 

This should give you exactly what you need.

 

thanks

 

 

 

 

From: Tod [via Lucene]
[mailto:ml-node+789105-593266572-124...@n3.nabble.com] 
Sent: Monday, May 10, 2010 7:22 AM
To: caman
Subject: JSON formatted response from SOLR question

 

I apologize, this is such a JSON/javascript question but I'm stuck and 
am not finding any resources that address this specifically. 

I'm doing a faceted search and getting back in my 
facet_counts.faceted_fields response an array of countries.  I'm 
gathering the count of the array elements returned using this notation: 

rsp.facet_counts.facet_fields.country.length 

... where rsp is the eval'ed JSON response from SOLR.  From there I just 
loop through listing the individual country with its associated count. 

The problem I am having is trying to automate this to loop through any 
one of a number of facets contained in my JSON response, not just 
country.  So instead of the above I would have something like: 

rsp.facet_counts.facet_fields.VARIABLE.length 

... where VARIABLE would be the name of one of the facets passed into a 
javascript function to perform the loop.  None of the javascript 
examples I can find seems to address this.  Has anyone run into this? 
Is there a better list to ask this question? 


Thanks in advance. 



  _  

View message @
http://lucene.472066.n3.nabble.com/JSON-formatted-response-from-SOLR-questio
n-tp789105p789105.html 
To start a new topic under Solr - User, email
ml-node+472068-464289649-124...@n3.nabble.com 
To unsubscribe from Solr - User, click
 (link removed) 
GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx  here. 

 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/JSON-formatted-response-from-SOLR-question-tp789105p789183.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: DIH full-import memory issue

2010-05-10 Thread caman

This may help:

batchSize : The batchsize used in jdbc connection

 

http://wiki.apache.org/solr/DataImportHandler#Configuring_DataSources

 

 

 

 

From: Geek Gamer [via Lucene]
[mailto:ml-node+809069-2054572211-124...@n3.nabble.com] 
Sent: Monday, May 10, 2010 9:42 PM
To: caman
Subject: DIH full-import memory issue

 

Hi, 

I am facing issues with DIH fullimport, 

I have a database with 3 million records that will translate into index size

of 6GB. 

When I am trying to do full import I am getting out of memory error like : 

INFO: Starting Full Import 
May 10, 2010 11:44:06 PM org.apache.solr.handler.dataimport.SolrWriter 
readIndexerProperties 
WARNING: Unable to read: dataimport.properties 
May 10, 2010 11:44:06 PM org.apache.solr.update.DirectUpdateHandler2 
deleteAll 
INFO: [] REMOVING ALL DOCUMENTS FROM INDEX 
May 10, 2010 11:44:06 PM org.apache.solr.core.SolrDeletionPolicy onInit 
INFO: SolrDeletionPolicy.onInit: commits:num=1 
commit{dir=/home/search/SOLR/solr/data/index,segFN=segments_1,version=127354
9043650,generation=1,filenames=[segments_1] 
May 10, 2010 11:44:06 PM org.apache.solr.core.SolrDeletionPolicy 
updateCommits 
INFO: newest commit = 1273549043650 
May 10, 2010 11:44:06 PM org.apache.solr.handler.dataimport.JdbcDataSource$1

call 
INFO: Creating a connection for entity offer with URL: 
jdbc:mysql://domU-12-31-39-10-59-01.compute-1.internal/jounce1 
May 10, 2010 11:44:07 PM org.apache.solr.handler.dataimport.JdbcDataSource$1

call 
INFO: Time taken for getConnection(): 301 



Exception in thread Timer-1 java.lang.OutOfMemoryError: Java heap space 
at java.util.HashMap.newValueIterator(HashMap.java:843) 
at java.util.HashMap$Values.iterator(HashMap.java:910) 
at 
org.mortbay.jetty.servlet.HashSessionManager.scavenge(HashSessionManager.jav
a:180) 
at 
org.mortbay.jetty.servlet.HashSessionManager.access$000(HashSessionManager.j
ava:36) 
at 
org.mortbay.jetty.servlet.HashSessionManager$1.run(HashSessionManager.java:1
44) 
at java.util.TimerThread.mainLoop(Timer.java:512) 
at java.util.TimerThread.run(Timer.java:462) 
May 10, 2010 11:54:54 PM org.apache.solr.handler.dataimport.DataImporter 
doFullImport 
SEVERE: Full Import failed 
org.apache.solr.handler.dataimport.DataImportHandlerException: 
java.lang.OutOfMemoryError: Java heap space 
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:
424) 
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242
) 
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) 
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.ja
va:331) 
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389
) 
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)

Caused by: java.lang.OutOfMemoryError: Java heap space 
at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:1621) 
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1398) 
at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2816) 
at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:467) 
at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:2510) 
at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1746) 
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2135) 
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2536) 
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2465) 
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:734) 
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(J
dbcDataSource.java:246) 
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.jav
a:210) 
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.jav
a:39) 
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityPro
cessor.java:58) 
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProce
ssor.java:71) 
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProc
essorWrapper.java:237) 
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:
357) 
... 5 more 
May 10, 2010 11:54:54 PM org.apache.solr.update.DirectUpdateHandler2 
rollback 
INFO: start rollback 
May 10, 2010 11:54:54 PM org.apache.solr.update.DirectUpdateHandler2 
rollback 
INFO: end_rollback 




I tried allocating 4 Gigs of memory to the VM but no luck. 
Are the records cached before indexing or streamed? 
any pointers to documents? 

thanks in anticipation, 
umar 



  _  

View message @
http://lucene.472066.n3.nabble.com/DIH-full-import-memory-issue-tp809069p809
069.html 
To start a new topic under Solr - User, email
ml-node+472068-464289649-124...@n3.nabble.com 
To unsubscribe from Solr - User, click
 (link removed) 
GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx  here. 

 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-full-import-memory-issue

RE: Embedded Solr search query

2010-05-07 Thread caman

Why not write a custom request handler which can parse, split, execute and
combine results to your queries?

 

 

 

From: Eric Grobler [via Lucene]
[mailto:ml-node+783150-1027691461-124...@n3.nabble.com] 
Sent: Friday, May 07, 2010 1:01 AM
To: caman
Subject: Embedded Solr search query

 

Hello Solr community, 

When a user search on our web page, we need to run 3 related but different 
queries. 
For SEO reasons, we cannot use Ajax so at the moment we run 3 queries 
sequentially inside a PHP script. 
Allthough Solr is superfast,  the extra network overhead can make the 3 
queries 400ms slower than it needs to be. 

Thus my question is: 
Is there a way whereby you can send 1 query string to Solr with 2 or more 
embedded search queries, where Solr will split and execute the queries and 
return the results of the multiple searches in 1 go. 

In other words, instead of: 
-  send searchQuery1 
   get result1 
-  send searchQuery2 
   get result2 
... 

you run: 
- send searchQuery1+searchQuery2 
- get result1+result2 

Thanks and Regards 
Eric 



  _  

View message @
http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p78315
0.html 
To start a new topic under Solr - User, email
ml-node+472068-464289649-124...@n3.nabble.com 
To unsubscribe from Solr - User, click
 (link removed) 
GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx  here. 

 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p783156.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Help indexing PDF files

2010-05-07 Thread caman

Take a look at Tika library

 

From: Leonardo Azize Martins [via Lucene]
[mailto:ml-node+783677-325080270-124...@n3.nabble.com] 
Sent: Friday, May 07, 2010 6:37 AM
To: caman
Subject: Help indexing PDF files

 

Hi, 

I am new in Solr. 
I would like to index some PDF files. 

How can I do using example schema from 1.4.0 version? 

Regards, 
Leo 



  _  

View message @
http://lucene.472066.n3.nabble.com/Help-indexing-PDF-files-tp783677p783677.h
tml 
To start a new topic under Solr - User, email
ml-node+472068-464289649-124...@n3.nabble.com 
To unsubscribe from Solr - User, click
 (link removed) 
GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx  here. 

 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-indexing-PDF-files-tp783677p784092.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Embedded Solr search query

2010-05-07 Thread caman

I would just look at SOLR source code and see how standard search handler
and dismaxSearchHandler are implemented.

Look under package 'org.apache.solr.
http://hudson.zones.apache.org/hudson/job/Solr-trunk/clover/org/apache/solr
/handler/pkg-summary.html handler'

 

 

 

From: Eric Grobler [via Lucene]
[mailto:ml-node+783212-2036924225-124...@n3.nabble.com] 
Sent: Friday, May 07, 2010 1:33 AM
To: caman
Subject: Re: Embedded Solr search query

 

Hi Camen, 

I was hoping someone has done it already :-) 
I am also new to Solr/lucene, can you perhaps point me to a request handler 
example page? 

Thanks and Regards 
Eric 

On Fri, May 7, 2010 at 9:05 AM, caman [hidden email]wrote: 


 
 Why not write a custom request handler which can parse, split, execute and

 combine results to your queries? 
 
 
 
 
 
 
 
 From: Eric Grobler [via Lucene] 
 [mailto:[hidden email][hidden email] 
 ] 
 Sent: Friday, May 07, 2010 1:01 AM 
 To: caman 
 Subject: Embedded Solr search query 
 
 
 
 Hello Solr community, 
 
 When a user search on our web page, we need to run 3 related but different

 queries. 
 For SEO reasons, we cannot use Ajax so at the moment we run 3 queries 
 sequentially inside a PHP script. 
 Allthough Solr is superfast,  the extra network overhead can make the 3 
 queries 400ms slower than it needs to be. 
 
 Thus my question is: 
 Is there a way whereby you can send 1 query string to Solr with 2 or more 
 embedded search queries, where Solr will split and execute the queries and

 return the results of the multiple searches in 1 go. 
 
 In other words, instead of: 
 -  send searchQuery1 
   get result1 
 -  send searchQuery2 
   get result2 
 ... 
 
 you run: 
 - send searchQuery1+searchQuery2 
 - get result1+result2 
 
 Thanks and Regards 
 Eric 
 
 
 
   _ 
 
 View message @ 
 

http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p78315

0.htmlhttp://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp78315
0p78315%0A0.html 
 To start a new topic under Solr - User, email 
 [hidden email][hidden email] 
 To unsubscribe from Solr - User, click 
  (link removed) 
 GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx  here. 
 
 
 
 
 -- 
 View this message in context: 

http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p78315
6.html
 Sent from the Solr - User mailing list archive at Nabble.com. 
 

 

  _  

View message @
http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p78321
2.html 
To start a new topic under Solr - User, email
ml-node+472068-464289649-124...@n3.nabble.com 
To unsubscribe from Solr - User, click
 (link removed) 
GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx  here. 

 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p784098.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: run on reboot on windows

2010-05-02 Thread caman

Ahmed,

 

Best is if you take a look at the documentation of jetty or tomcat. SOLR can
run on any web container, it's up to you how you  configure your web
container to run

 

Thanks

Aboxy

 

 

 

 

 

From: S Ahmed [via Lucene]
[mailto:ml-node+772174-2097041460-124...@n3.nabble.com] 
Sent: Sunday, May 02, 2010 4:33 PM
To: caman
Subject: Re: run on reboot on windows

 

By default it uses Jetty, so your saying Tomcat on windows server 2008/ IIS7

runs as a native windows service? 

On Sun, May 2, 2010 at 12:46 AM, Dave Searle [hidden email]wrote: 


 Set tomcat6 service to auto start on boot (if running tomat) 
 
 Sent from my iPhone 
 
 On 2 May 2010, at 02:31, S Ahmed [hidden email] wrote: 
 
  Hi, 
  
  I'm trying to get Solr to run on windows, such that if it reboots 
  the Solr 
  service will be running. 
  
  How can I do this? 
 

 

  _  

View message @
http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772174.
html 
To start a new topic under Solr - User, email
ml-node+472068-464289649-124...@n3.nabble.com 
To unsubscribe from Solr - User, click
 (link removed) 
GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx  here. 

 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772178.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: run on reboot on windows

2010-05-02 Thread caman

Please take a look at this for tomcat

http://tomcat.apache.org/tomcat-6.0-doc/setup.html#Windows

 

and for jetty :

http://docs.codehaus.org/display/JETTY/Win32Wrapper

 

 

Hope this helps.

 

From: S Ahmed [via Lucene]
[mailto:ml-node+772182-2115387142-124...@n3.nabble.com] 
Sent: Sunday, May 02, 2010 4:44 PM
To: caman
Subject: Re: run on reboot on windows

 

its not tomcat/jetty that's the issue, its how to get things to re-start on 
a windows server (tomcat and jetty don't run as native windows services) so 
I am a little confused..thanks. 

On Sun, May 2, 2010 at 7:37 PM, caman [hidden email]wrote: 


 
 Ahmed, 
 
 
 
 Best is if you take a look at the documentation of jetty or tomcat. SOLR 
 can 
 run on any web container, it's up to you how you  configure your web 
 container to run 
 
 
 
 Thanks 
 
 Aboxy 
 
 
 
 
 
 
 
 
 
 
 
 From: S Ahmed [via Lucene] 
 [mailto:[hidden email][hidden email] 
 ] 
 Sent: Sunday, May 02, 2010 4:33 PM 
 To: caman 
 Subject: Re: run on reboot on windows 
 
 
 
 By default it uses Jetty, so your saying Tomcat on windows server 2008/ 
 IIS7 
 
 runs as a native windows service? 
 
 On Sun, May 2, 2010 at 12:46 AM, Dave Searle [hidden email]wrote: 
 
 
  Set tomcat6 service to auto start on boot (if running tomat) 
  
  Sent from my iPhone 
  
  On 2 May 2010, at 02:31, S Ahmed [hidden email] wrote: 
  
   Hi, 
   
   I'm trying to get Solr to run on windows, such that if it reboots 
   the Solr 
   service will be running. 
   
   How can I do this? 
  
 
 
 
   _ 
 
 View message @ 
 

http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772174.

 html 
 To start a new topic under Solr - User, email 
 [hidden email][hidden email] 
 To unsubscribe from Solr - User, click 
  (link removed) 
 GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx  here. 
 
 
 
 
 -- 
 View this message in context: 

http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772178.
html
 Sent from the Solr - User mailing list archive at Nabble.com. 
 

 

  _  

View message @
http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772182.
html 
To start a new topic under Solr - User, email
ml-node+472068-464289649-124...@n3.nabble.com 
To unsubscribe from Solr - User, click
 (link removed) 
GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx  here. 

 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772190.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Only one field in the result

2010-04-28 Thread caman

I think you are looking for 'fl'  param.

 

 

From: pcmanprogrammeur [via Lucene]
[mailto:ml-node+761818-821639313-124...@n3.nabble.com] 
Sent: Wednesday, April 28, 2010 12:38 AM
To: caman
Subject: Only one field in the result

 

Hello, 

In my schema.xml, i have some fields stored and indexed. However, in a
particular case, i would like to get only one field in my XML result ! Is it
possible? 

Thanks for your help ! 

  _  

View message @
http://lucene.472066.n3.nabble.com/Only-one-field-in-the-result-tp761818p761
818.html 
To start a new topic under Solr - User, email
ml-node+472068-464289649-124...@n3.nabble.com 
To unsubscribe from Solr - User, click
 (link removed) 
GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx  here. 

 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Only-one-field-in-the-result-tp761818p761823.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Problem with DataImportHandler and embedded entities

2010-04-21 Thread caman

Are you storing the comment field or indexing it?

field  ..  Stored=false ...   will not appear in the document.

 

From: Jason Rutherglen [via Lucene]
[mailto:ml-node+740624-966329660-124...@n3.nabble.com] 
Sent: Wednesday, April 21, 2010 10:15 AM
To: caman
Subject: Problem with DataImportHandler and embedded entities

 

I'm using the following data-config.xml with DataImportHandler.  I've 
never used embedded entities before however I'm not seeing the comment 
show up in the document... I'm not sure what's up. 

dataConfig 
  dataSource type=JdbcDataSource name=ch 
driver=com.mysql.jdbc.Driver url=jdbc:mysql://127.0.0.1:3306/ch 
batchSize=-1 user=ch password=ch_on_this/ 
  document name=ch 
entity name=applications pk=id dataSource=ch 
  query=SELECT id, updated FROM applications limit 10 
  entity name=comment dataSource=ch query=SELECT comment 
FROM ratings WHERE app = ${applications.id} 
field name=comment column=comment/ 
  /entity 
/entity 
  /document 
/dataConfig 



  _  

View message @
http://n3.nabble.com/Problem-with-DataImportHandler-and-embedded-entities-tp
740624p740624.html 
To start a new topic under Solr - User, email
ml-node+472068-464289649-124...@n3.nabble.com 
To unsubscribe from Solr - User, click
 (link removed) 
yc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx  here. 

 


-- 
View this message in context: 
http://n3.nabble.com/Problem-with-DataImportHandler-and-embedded-entities-tp740624p740634.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Problem with DataImportHandler and embedded entities

2010-04-21 Thread caman

Hard to tell.

 

Did you try putting the child entity part of main query with subquery. Don't
think that is the issue though but worth a try

Select id, updated,( SELECT comment  FROM ratings WHERE app = appParent.id)
as comment FROM applications appParent limit 10

 

 

From: Jason Rutherglen [via Lucene]
[mailto:ml-node+740680-1955771337-124...@n3.nabble.com] 
Sent: Wednesday, April 21, 2010 10:33 AM
To: caman
Subject: Re: Problem with DataImportHandler and embedded entities

 

Caman, 

I'm storing it.  This is what I see when DataImportHandler verbose is turned
on. 

While the field names don't match, I am seeing that sub-queries are 
being performed, data is being returned.  It's just not making it into 
the document. 

lst name=verbose-output 
- 
lst name=entity:applications 
- 
lst name=document#1 
str name=querySELECT id, updated FROM applications limit 10/str 
str name=time-taken0:0:0.9/str 
str--- row #1-/str 
int name=id407/int 
date name=updated2009-11-02T06:35:48Z/date 
str-/str 
- 
lst name=entity:added 
str name=querySELECT added FROM ratings WHERE app = 407/str 
str name=time-taken0:0:0.8/str 
/lst 
/lst 

On Wed, Apr 21, 2010 at 10:17 AM, caman [hidden email]
http://n3.nabble.com/user/SendEmail.jtp?type=nodenode=740680i=0  wrote:



 
 Are you storing the comment field or indexing it? 
 
 field  ..  Stored=false ...   will not appear in the document. 
 
 
 
 From: Jason Rutherglen [via Lucene] 
 [mailto:[hidden email]
http://n3.nabble.com/user/SendEmail.jtp?type=nodenode=740680i=1 ] 
 Sent: Wednesday, April 21, 2010 10:15 AM 
 To: caman 
 Subject: Problem with DataImportHandler and embedded entities 
 
 
 
 I'm using the following data-config.xml with DataImportHandler.  I've 
 never used embedded entities before however I'm not seeing the comment 
 show up in the document... I'm not sure what's up. 
 
 dataConfig 
  dataSource type=JdbcDataSource name=ch 
 driver=com.mysql.jdbc.Driver url=jdbc:mysql://127.0.0.1:3306/ch 
 batchSize=-1 user=ch password=ch_on_this/ 
  document name=ch 
entity name=applications pk=id dataSource=ch 
  query=SELECT id, updated FROM applications limit 10 
  entity name=comment dataSource=ch query=SELECT comment 
 FROM ratings WHERE app = ${applications.id} 
field name=comment column=comment/ 
  /entity 
/entity 
  /document 
 /dataConfig 
 
 
 
  _ 
 
 View message @ 

http://n3.nabble.com/Problem-with-DataImportHandler-and-embedded-entities-tp
 740624p740624.html 
 To start a new topic under Solr - User, email 
 [hidden email]
http://n3.nabble.com/user/SendEmail.jtp?type=nodenode=740680i=2  
 To unsubscribe from Solr - User, click 
  (link removed) 
 yc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx  here. 
 
 
 
 
 -- 
 View this message in context:
http://n3.nabble.com/Problem-with-DataImportHandler-and-embedded-entities-tp
740624p740634.html
 Sent from the Solr - User mailing list archive at Nabble.com. 
 

 

  _  

View message @
http://n3.nabble.com/Problem-with-DataImportHandler-and-embedded-entities-tp
740624p740680.html 
To start a new topic under Solr - User, email
ml-node+472068-464289649-124...@n3.nabble.com 
To unsubscribe from Solr - User, click
 (link removed) 
yc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx  here. 

 


-- 
View this message in context: 
http://n3.nabble.com/Problem-with-DataImportHandler-and-embedded-entities-tp740624p740708.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Problem with DataImportHandler and embedded entities

2010-04-21 Thread caman

What is the unique id set in schema?

 

 

 

From: Jason Rutherglen [via Lucene] 
[mailto:ml-node+740744-1209892083-124...@n3.nabble.com] 
Sent: Wednesday, April 21, 2010 10:56 AM
To: caman
Subject: Re: Problem with DataImportHandler and embedded entities

 

The other issue now is full-import is only importing 1 document, and 
that's all.  Despite no limits etc... Odd... 

On Wed, Apr 21, 2010 at 10:48 AM, Jason Rutherglen 
[hidden email] 
http://n3.nabble.com/user/SendEmail.jtp?type=nodenode=740744i=0  wrote: 


 I think it's working, it was the lack of the seemingly innocuous 
 sub-entity pk=application_id.  After adding that I'm seeing some 
 data returned. 
 
 On Wed, Apr 21, 2010 at 10:44 AM, Jason Rutherglen 
 [hidden email] 
 http://n3.nabble.com/user/SendEmail.jtp?type=nodenode=740744i=1  wrote: 
 Something's off, for each row, it's performing the following 5 
 sub-queries.  Weird.  Below is the updated data-config.xml (compared 
 to the original email I changed the field from comment to added). 
 
 lst name=document#5 
 str--- row #1-/str 
 int name=id876/int 
 date name=updated2009-11-02T06:36:28Z/date 
 str-/str 
 - 
 lst name=entity:added 
 str name=querySELECT added FROM ratings WHERE app = 876/str 
 str name=querySELECT added FROM ratings WHERE app = 876/str 
 str name=querySELECT added FROM ratings WHERE app = 876/str 
 str name=querySELECT added FROM ratings WHERE app = 876/str 
 str name=querySELECT added FROM ratings WHERE app = 876/str 
 str name=time-taken0:0:0.0/str 
 str name=time-taken0:0:0.0/str 
 str name=time-taken0:0:0.0/str 
 str name=time-taken0:0:0.0/str 
 str name=time-taken0:0:0.0/str 
 str--- row #1-/str 
 date name=added2010-01-26T18:08:53Z/date 
 str-/str 
 str--- row #2-/str 
 date name=added2010-01-27T20:16:20Z/date 
 str-/str 
 str--- row #3-/str 
 date name=added2010-01-29T00:02:40Z/date 
 str-/str 
 str--- row #4-/str 
 date name=added2010-02-01T16:59:42Z/date 
 str-/str 
 /lst 
 /lst 
 
 dataConfig 
  dataSource type=JdbcDataSource name=ch 
 driver=com.mysql.jdbc.Driver url=jdbc:mysql://127.0.0.1:3306/ch 
 batchSize=-1 user=ch password=ch_on_this/ 
  document name=ch 
entity name=applications pk=id dataSource=ch 
  query=SELECT id, updated FROM applications limit 10 
  entity name=comment dataSource=ch query=SELECT * FROM 
 ratings WHERE app = ${applications.id} 
field name=comment column=comment/ 
field name=added column=added/ 
  /entity 
/entity 
  /document 
 /dataConfig 
 
 On Wed, Apr 21, 2010 at 10:41 AM, caman [hidden email] 
 http://n3.nabble.com/user/SendEmail.jtp?type=nodenode=740744i=2  wrote: 
 
 Hard to tell. 
 
 
 
 Did you try putting the child entity part of main query with subquery. 
 Don't 
 think that is the issue though but worth a try 
 
 Select id, updated,( SELECT comment  FROM ratings WHERE app = appParent.id) 
 as comment FROM applications appParent limit 10 
 
 
 
 
 
 From: Jason Rutherglen [via Lucene] 
 [mailto:[hidden email] 
 http://n3.nabble.com/user/SendEmail.jtp?type=nodenode=740744i=3 ] 
 Sent: Wednesday, April 21, 2010 10:33 AM 
 To: caman 
 Subject: Re: Problem with DataImportHandler and embedded entities 
 
 
 
 Caman, 
 
 I'm storing it.  This is what I see when DataImportHandler verbose is 
 turned 
 on. 
 
 While the field names don't match, I am seeing that sub-queries are 
 being performed, data is being returned.  It's just not making it into 
 the document. 
 
 lst name=verbose-output 
 - 
 lst name=entity:applications 
 - 
 lst name=document#1 
 str name=querySELECT id, updated FROM applications limit 10/str 
 str name=time-taken0:0:0.9/str 
 str--- row #1-/str 
 int name=id407/int 
 date name=updated2009-11-02T06:35:48Z/date 
 str-/str 
 - 
 lst name=entity:added 
 str name=querySELECT added FROM ratings WHERE app = 407/str 
 str name=time-taken0:0:0.8/str 
 /lst 
 /lst 
 
 On Wed, Apr 21, 2010 at 10:17 AM, caman [hidden email] 
 http://n3.nabble.com/user/SendEmail.jtp?type=node 
 http://n3.nabble.com/user/SendEmail.jtp?type=nodenode=740680i=0 
 node=740680i=0  wrote: 
 
 
 
 
 Are you storing the comment field or indexing it? 
 
 field  ..  Stored=false ...   will not appear in the document. 
 
 
 
 From: Jason Rutherglen [via Lucene] 
 [mailto:[hidden email] 
 http://n3.nabble.com/user/SendEmail.jtp?type=node 
 http://n3.nabble.com/user/SendEmail.jtp?type=nodenode=740680i=1 
 node=740680i=1 ] 
 Sent: Wednesday, April 21, 2010 10:15 AM 
 To: caman 
 Subject: Problem with DataImportHandler and embedded entities 
 
 
 
 I'm using the following data-config.xml with DataImportHandler.  I've 
 never used

RE: DIH dataimport.properties with

2010-04-20 Thread caman

Shawn,

 

Is this your custom implementation?

 

For a delta-import, minDid comes from 
the maxDid value stored after the last successful import.



 

Are you updating the dataTable after the import was successful? How did you
handle this? I have similar scenario and your approach will work for my
use-case as well

 

 

thanks

 

 

 

 

 

From: Shawn Heisey-4 [via Lucene]
[mailto:ml-node+738653-1765413222-124...@n3.nabble.com] 
Sent: Tuesday, April 20, 2010 4:35 PM
To: caman
Subject: Re: DIH dataimport.properties with

 

Michael, 

The SolrEntityProcessor looks very intriguing, but it won't work with 
the released 1.4 version.  If that's OK with you and it looks like it'll 
do what you want, feel free to ignore the rest of this. 

I'm also using MySQL as an import source for Solr.  I was unable to use 
the last_index_time because my database doesn't have a field I can match 
against it.  I believe you can use something similar to the method that 
I came up with.  The point of this post is to show you how to inject 
values from outside Solr into a DIH request rather than have Solr 
provide the milestone that indicates new content. 

Here's a simplified version of my URL template and entity configuration 
in data-config.xml.  The did field in my database is an autoincrement 
BIGINT serving as my private key, but something similar could likely be 
cooked up with timestamps too: 

http://HOST:PORT/solr/CORE/dataimport?command=COMMAND
http://HOST:PORT/solr/CORE/dataimport?command=COMMANDdataTable=DATATABLEm
inDid=MINDIDmaxDid=MAXDID dataTable=DATATABLEminDid=MINDIDmaxDid=MAXDID

 

entity name=dataTable pk=did 
query=SELECT * FROM ${dataimporter.request.dataTable} WHERE did  
${dataimporter.request.minDid} AND did = 
${dataimporter.request.maxDid} 
deltaQuery=SELECT MAX(did) FROM ${dataimporter.request.dataTable} 
deltaImportQuery=SELECT * FROM ${dataimporter.request.dataTable} WHERE 
did  ${dataimporter.request.minDid} AND did = 
${dataimporter.request.maxDid} 
/entity 

 

If I am doing a full-import, I set minDid to zero and maxDid to the 
highest value in the database.  For a delta-import, minDid comes from 
the maxDid value stored after the last successful import. 

The deltaQuery is required, but in my case, is a throw-away query that 
just tells Solr the delta-import needs to be run.  My query and 
deltaImportQuery are identical, though yours may not be. 

Good luck, no matter how you choose to approach this. 

Shawn 


On 4/18/2010 9:02 PM, Michael Tibben wrote: 


 I don't really understand how this will help. Can you elaborate ? 
 
 Do you mean that the last_index_time can be imported from somewhere 
 outside solr?  But I need to be able to *set* what last_index_time is 
 stored in dataimport.properties, not get properties from somewhere else 
 
 
 
 On 18/04/10 10:02, Lance Norskog wrote: 
 The SolrEntityProcessor allows you to query a Solr instance and use 
 the results as DIH properties. You would have to create your own 
 regular query to do the delta-import instead of using the delta-import 
 feature. 





  _  

View message @
http://n3.nabble.com/DIH-dataimport-properties-with-tp722924p738653.html 
To start a new topic under Solr - User, email
ml-node+472068-464289649-124...@n3.nabble.com 
To unsubscribe from Solr - User, click
 (link removed) 
yc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx  here. 

 


-- 
View this message in context: 
http://n3.nabble.com/DIH-dataimport-properties-with-tp722924p738949.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: dismax vs the standard query handlers

2010-04-20 Thread caman

Your answers are here. Wiki describes it pretty well

 

http://wiki.apache.org/solr/DisMaxRequestHandler

 

 

 

From: Sandhya Agarwal [via Lucene] 
[mailto:ml-node+739071-961078546-124...@n3.nabble.com] 
Sent: Tuesday, April 20, 2010 9:40 PM
To: caman
Subject: dismax vs the standard query handlers

 

Hello, 

What are the advantages of using the “dismax” query handler vs the “standard” 
query handler.  As I understand, “dismax” queries are parsed differently and 
provide more flexibility w.r.t score boosting etc. Do we have any more reasons 
? 

Thanks, 
Sandhya 



  _  

View message @ 
http://n3.nabble.com/dismax-vs-the-standard-query-handlers-tp739071p739071.html 
To start a new topic under Solr - User, email 
ml-node+472068-464289649-124...@n3.nabble.com 
To unsubscribe from Solr - User, click  (link removed)   here. 

 


-- 
View this message in context: 
http://n3.nabble.com/dismax-vs-the-standard-query-handlers-tp739071p739081.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: DIH questions

2010-04-15 Thread caman

I had similar requirement and was not able to figure out at that time. Was
able to use some of the SQL Magic to create concatenated string for
sub-entities  and then process them in transformer which may or may not work
for your use-case. Just a thought. 

Mention specifics here please and I can see if anything can be done

 

Thanks

James

http://www.click2money.com

 

 

From: Blargy [via Lucene]
[mailto:ml-node+722651-1893075853-124...@n3.nabble.com] 
Sent: Thursday, April 15, 2010 4:28 PM
To: caman
Subject: Re: DIH questions

 

Is there anyway that a sub-entity can delete/rewrite fields from the
document? Is there anyway sub-entities can get access to what the documents
current value for a current field? 

  _  

View message @ http://n3.nabble.com/DIH-questions-tp719892p722651.html 
To start a new topic under Solr - User, email
ml-node+472068-464289649-124...@n3.nabble.com 
To unsubscribe from Solr - User, click
 (link removed) 
yc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx  here. 

 


-- 
View this message in context: 
http://n3.nabble.com/DIH-questions-tp719892p722676.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: CopyField

2010-04-15 Thread caman

As far as I know, No.

But why don't you keep another column 'source_final' and you populate it
with value from sourc1 or sourc2 depending on what has value(Look at
transformer, may be script transformer) . then in schema.xml

  copyField source=source_final dest=dest/

 

Thanks

James

http://www.click2money.com

 

 

From: Blargy [via Lucene]
[mailto:ml-node+722785-1511121936-124...@n3.nabble.com] 
Sent: Thursday, April 15, 2010 5:54 PM
To: caman
Subject: CopyField

 

Is there anyway to instruct copy field overwrite an existing field, or only
accept the first one? 

  copyField source=source1 dest=dest/ 
  copyField source=source2 dest=dest/ 

Basically I'm want to copy source1 to dest (if it exists). If source1 doesnt
exist then copy source2 into dest. 

Is this possible? 

  _  

View message @ http://n3.nabble.com/CopyField-tp722785p722785.html 
To start a new topic under Solr - User, email
ml-node+472068-464289649-124...@n3.nabble.com 
To unsubscribe from Solr - User, click
 (link removed) 
yc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx  here. 

 


-- 
View this message in context: 
http://n3.nabble.com/CopyField-tp722785p722800.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dynamic categorization transactional data

2010-03-20 Thread caman

@Grant
Less than a minute.  If we go with the meta-retrieval from the index, we
will have to keep the index updated down to seconds. But that may not scale
well.  Probably a hybrid approach?
I will look into classifier. thanks





Grant Ingersoll-6 wrote:
 
 
 On Mar 18, 2010, at 2:44 PM, caman wrote:
 
 
 1) Took care of the first one by Transformer.
 
 This is often also something done by a classifier that is trained to deal
 with all the statistical variations in your text.  Tools like Weka,
 Mahout, OpenNLP, etc. can be applied here.
 
 2) Any input on 2 please? I need to store # of views and popularity with
 each document and that can change pretty often. Recommended to use
 database
 or can this be updated to SOLr directly? My issue with DB is that with
 every
 SOLR search hit, will have to do DB hit to retrieve meta-data. 
 
 Define often, please.  Less than a minute or more than a minute?
 
 
 Any input is appreciated please
 
 caman wrote:
 
 Hello all,
 
 Please see below.any help much appreciated.
 1) Extracting data out of a text field to assign a category for certain
 configured words. e.g. If the text is Google does it again with
 Android 
 and If 'Google' and 'Android' are the configured words, I want to b able
 to assign the article to tags 'Google' and 'Android' and 'Technical' .
 Can
 I do this with a custom filter during analysis? Similarly setting up
 categories for each article based on keywords in the text.
 2) How about using SOLR as transactional datastore? Need to keep track
 of
 rating for each document. Would 'ExternalFileField' be good choice for
 this use-case?
 
 Thanks in advance.
 
 
 -- 
 View this message in context:
 http://old.nabble.com/dynamic-categorization---transactional-data-tp27790233p27949786.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem using Solr/Lucene:
 http://www.lucidimagination.com/search
 
 
 

-- 
View this message in context: 
http://old.nabble.com/dynamic-categorization---transactional-data-tp27790233p27970656.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: dynamic categorization transactional data

2010-03-18 Thread caman

1) Took care of the first one by Transformer.
2) Any input on 2 please? I need to store # of views and popularity with
each document and that can change pretty often. Recommended to use database
or can this be updated to SOLr directly? My issue with DB is that with every
SOLR search hit, will have to do DB hit to retrieve meta-data. 

Any input id appreciated please

caman wrote:
 
 Hello all,
 
 Please see below.any help much appreciated.
 1) Extracting data out of a text field to assign a category for certain
 configured words. e.g. If the text is Google does it again with Android 
 and If 'Google' and 'Android' are the configured words, I want to b able
 to assign the article to tags 'Google' and 'Android' and 'Technical' . Can
 I do this with a custom filter during analysis? Similarly setting up
 categories for each article based on keywords in the text.
 2) How about using SOLR as transactional datastore? Need to keep track of
 rating for each document. Would 'ExternalFileField' be good choice for
 this use-case?
 
 Thanks in advance.
 

-- 
View this message in context: 
http://old.nabble.com/dynamic-categorization---transactional-data-tp27790233p27949786.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: dynamic categorization transactional data

2010-03-18 Thread caman

David,

Much appreciated. This gives me enough to work with. 
I missed one important point. Our data changes pretty frequently which mean
we may be running deltas every 5-10 minutes. in-memory should work
thanks





David Smiley @MITRE.org wrote:
 
 You'll probably want to influence your relevancy on this popularity number
 that is changing often.  ExternalFileField looks like a possibility though
 I haven't used it.  Another would be using an in-memory cache which stores
 all popularity numbers for any data that has its popularity updated since
 the last index update (say since the previous night).  On second thought,
 it may need to be absolutely all of them but these are just #s so no big
 deal?  You could then customize a ValueSource subclass which gets data
 from this fast in-memory up to date source.  See FileFloatSource for an
 example that uses a file instead of an in-memory structure.
 
 ~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
 
 
 On Mar 18, 2010, at 2:44 PM, caman wrote:
 
 2) Any input on 2 please? I need to store # of views and popularity with
 each document and that can change pretty often. Recommended to use
 database
 or can this be updated to SOLr directly? My issue with DB is that with
 every
 SOLR search hit, will have to do DB hit to retrieve meta-data. 
 
 Any input is appreciated please
 
 
 
 

-- 
View this message in context: 
http://old.nabble.com/dynamic-categorization---transactional-data-tp27790233p27950036.html
Sent from the Solr - User mailing list archive at Nabble.com.



dynamic categorization transactional data

2010-03-04 Thread caman

Hello all,

Please see below.any help much appreciated.
1) Extracting data out of a text field to assign a category for certain
configured words. e.g. If the text is Google does it again with Android 
and If 'Google' and 'Android' are the configured words, I want to b able to
assign the article to tags 'Google' and 'Android' and 'Technical' . Can I do
this with a custom filter during analysis? Similarly setting up categories
for each article based on keywords in the text.
2) How about using SOLR as transactional datastore? Need to keep track of
rating for each document. Would 'ExternalFileField' be good choice for this
use-case?

Thanks in advance.
-- 
View this message in context: 
http://old.nabble.com/dynamic-categorization---transactional-data-tp27790233p27790233.html
Sent from the Solr - User mailing list archive at Nabble.com.



SOLR Index or database

2010-03-03 Thread caman

Hello All, 

Just struggling with a thought where SOLR or a database would be good option
for me.Here are my requirements.
We index about 600+ news/blogs into out system. Only information we store
locally is the title,link and article snippet.We are able to index all these
sources into SOLR index and it works perfectly.
This is where is gets tricky: 
We need to store certain meta information as well. e.g.
1. Rating/popularity of article
2. Sharing of the articles between users
3. How may times articles is viewed.
4. Comments on each article.

So far, we are deciding to store meta-information in the database and link
this data with the a document in the index. When user opens the page,
results are combined from index and the database to render the view. 

Any reservation on using the above architecture? 
Is SOLR right fit in this case? We do need full text search so SOLR is
no-brainer imho but would love to hear community view.

Any feedback appreciated

thanks




-- 
View this message in context: 
http://old.nabble.com/SOLR-Index-or-database-tp27772362p27772362.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Indexing an oracle warehouse table

2010-02-03 Thread caman

Thanks. I will give this a shot.

Alexey-34 wrote:
 
 What would be the right way to point out which field contains the term
 searched for.
 I would use highlighting for all of these fields and then post process
 Solr response in order to check highlighting tags. But I don't have so
 many fields usually and don't know if it's possible to configure Solr
 to highlight fields using '*' as dynamic fields.
 
 On Wed, Feb 3, 2010 at 2:43 AM, caman aboxfortheotherst...@gmail.com
 wrote:

 Thanks all. I am on track.
 Another question:
 What would be the right way to point out which field contains the term
 searched for.
 e.g. If I search for SOLR and if the term exist in field788 for a
 document,
 how do I pinpoint that which field has the term.
 I copied all the fields in field called 'body' which makes searching
 easier
 but would be nice to show the field which has that exact term.

 thanks

 caman wrote:

 Hello all,

 hope someone can point me to right direction. I am trying to index an
 oracle warehouse table(TableA) with 850 columns. Out of the structure
 about 800 fields are CLOBs and are good candidate to enable full-text
 searching. Also have few columns which has relational link to other
 tables. I am clean on how to create a root entity and then pull data
 from
 other relational link as child entities.  Most columns in TableA are
 named
 as field1,field2...field800.
 Now my question is how to organize the schema efficiently:
 First option:
 if my query is 'select * from TableA', Do I  define field name=attr1
 column=FIELD1 / for each of those 800 columns?   Seems cumbersome.
 May
 be can write a script to generate XML instead of handwriting both in
 data-config.xml and schema.xml.
 OR
 Dont define any field name=attr1 column=FIELD1 / so that column in
 SOLR will be same as in the database table. But questions are 1)How do I
 define unique field in this scenario? 2) How to copy all the text fields
 to a common field for easy searching?

 Any helpful is appreciated. Please feel free to suggest any alternative
 way.

 Thanks







 --
 View this message in context:
 http://old.nabble.com/Indexing-an-oracle-warehouse-table-tp27414263p27429352.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://old.nabble.com/Indexing-an-oracle-warehouse-table-tp27414263p27439611.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Indexing an oracle warehouse table

2010-02-02 Thread caman

Anyone please?


caman wrote:
 
 Hello all,
 
 hope someone can point me to right direction. I am trying to index an
 oracle warehouse table(TableA) with 850 columns. Out of the structure
 about 800 fields are CLOBs and are good candidate to enable full-text
 searching. Also have few columns which has relational link to other
 tables. I am clean on how to create a root entity and then pull data from
 other relational link as child entities.  Most columns in TableA are named
 as field1,field2...field800.
 Now my question is how to organize the schema efficiently: 
 First option:
 if my query is 'select * from TableA', Do I  define field name=attr1
 column=FIELD1 / for each of those 800 columns?   Seems cumbersome. May
 be can write a script to generate XML instead of handwriting both in
 data-config.xml and schema.xml. 
 OR
 Dont define any field name=attr1 column=FIELD1 / so that column in
 SOLR will be same as in the database table. But questions are 1)How do I
 define unique field in this scenario? 2) How to copy all the text fields
 to a common field for easy searching? 
 
 Any helpful is appreciated. Please feel free to suggest any alternative
 way.
 
 Thanks
 
 
 
 
 
 

-- 
View this message in context: 
http://old.nabble.com/Indexing-an-oracle-warehouse-table-tp27414263p27424327.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Indexing an oracle warehouse table

2010-02-02 Thread caman

Ron,

Much appreciated.  Search requirements are as :
1) Enable search/faceting on author,service,datetime. 
2) Enable full text search on all text column which are named as col1
col800+  -- total of more than 800 columns. 

Here is what I did so far: Defined entities in db schema in db-config.xml
without any column definition in the file, which basically mean is that I
want to keep fields name same as in the database. 
Now in schema.xml : I have field tag for each database field retrieved
with the SQL queries in db-config.xml, which are more than 800+ (did not
write this by hand,wrote a groovy script to generate this for me from the
database)

Multi-valued : Yes, this is what I am using to copy all the fields
col1...col800+ to one multi-valued field. That fileld is set as default for
search.

You are right about going to original data source but then had to take a
different approach. Original source is all XML files which do not follow a
standard schema for the structure.

I hope what I mentioned above makes sense.appreciate the response.





Ron Chan wrote:
 
 it depends on what the search requirements are, so without knowing the
 details here are some vague pointers 
 
 you may only need to have fields for the columns you are going to be
 categorizing and searching on, this may be a small subset of the 800 and
 the rest can go into one large field to fulfil the full text search 
 
 another thing to look into is the multi value fields, this can sometimes
 replace the one-to-many relationships in database 
 
 also it may sometimes be worth while going to the original data source
 rather than the warehouse table, as this is already flattened and
 denormalised, the flattening and denormalizing will most likely be done a
 different way when solr indexing database type data, highly likely you
 will end up with less rows and less columns in the solr index, as each
 solr document can be seen as multi-dimensional 
 
 
 - Original Message - 
 From: caman aboxfortheotherst...@gmail.com 
 To: solr-user@lucene.apache.org 
 Sent: Tuesday, 2 February, 2010 1:23:01 AM 
 Subject: Indexing an oracle warehouse table 
 
 
 Hello all, 
 
 hope someone can point me to right direction. I am trying to index an
 oracle 
 warehouse table(TableA) with 850 columns. Out of the structure about 800 
 fields are CLOBs and are good candidate to enable full-text searching.
 Also 
 have few columns which has relational link to other tables. I am clean on 
 how to create a root entity and then pull data from other relational link
 as 
 child entities. Most columns in TableA are named as 
 field1,field2...field800. 
 Now my question is how to organize the schema efficiently: 
 First option: 
 if my query is 'select * from TableA', Do I define field name=attr1 
 column=FIELD1 / for each of those 800 columns? Seems cumbersome. May be 
 can write a script to generate XML instead of handwriting both in 
 data-config.xml and schema.xml. 
 OR 
 Dont define any field name=attr1 column=FIELD1 / so that column in 
 SOLR will be same as in the database table. But questions are 1)How do I 
 define unique field in this scenario? 2) How to copy all the text fields
 to 
 a common field for easy searching? 
 
 Any helpful is appreciated. Please feel free to suggest any alternative
 way. 
 
 Thanks 
 
 
 
 
 
 -- 
 View this message in context:
 http://old.nabble.com/Indexing-an-oracle-warehouse-table-tp27414263p27414263.html
  
 Sent from the Solr - User mailing list archive at Nabble.com. 
 
 
 

-- 
View this message in context: 
http://old.nabble.com/Indexing-an-oracle-warehouse-table-tp27414263p27425156.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Indexing a oracle warehouse table

2010-02-02 Thread caman

Alexey,

This is exactly what I was looking for. Thank you thank you thank you ..
Should have read the documentation a little better.
Much appreciated. 

Alexey-34 wrote:
 
 Dont define any field name=attr1 column=FIELD1 / so that column in
 SOLR will be same as in the database table.
 Correct
 You can define dynamic field dynamicField name=field*  type=text
 indexed=true  stored=true/ ( see
 http://wiki.apache.org/solr/SchemaXml#Dynamic_fields )
 
 1)How do I define unique field in this scenario?
 You can create primary key into database or generate it directly in
 Solr ( see UUID techniques http://wiki.apache.org/solr/UniqueKey )
 
 2) How to copy all the text fields to a common field for easy searching?
 copyField source=field* dest=field/ ( see
 http://wiki.apache.org/solr/SchemaXml#Copy_Fields )
 
 
 On Tue, Feb 2, 2010 at 4:22 AM, caman aboxfortheotherst...@gmail.com
 wrote:

 Hello all,

 hope someone can point me to right direction. I am trying to index an
 oracle
 warehouse table(TableA) with 850 columns. Out of the structure about 800
 fields are CLOBs and are good candidate to enable full-text searching.
 Also
 have few columns which has relational link to other tables. I am clean on
 how to create a root entity and then pull data from other relational link
 as
 child entities.  Most columns in TableA are named as
 field1,field2...field800.
 Now my question is how to organize the schema efficiently:
 First option:
 if my query is 'select * from TableA', Do I  define field name=attr1
 column=FIELD1 / for each of those 800 columns?   Seems cumbersome. May
 be
 can write a script to generate XML instead of handwriting both in
 data-config.xml and schema.xml.
 OR
 Dont define any field name=attr1 column=FIELD1 / so that column in
 SOLR will be same as in the database table. But questions are 1)How do I
 define unique field in this scenario? 2) How to copy all the text fields
 to
 a common field for easy searching?

 Any helpful is appreciated. Please feel free to suggest any alternative
 way.

 Thanks





 --
 View this message in context:
 http://old.nabble.com/Indexing-a-oracle-warehouse-table-tp27414263p27414263.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 

-- 
View this message in context: 
http://old.nabble.com/Indexing-an-oracle-warehouse-table-tp27414263p27426206.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Indexing an oracle warehouse table

2010-02-02 Thread caman

Thanks all. I am on track.
Another question: 
What would be the right way to point out which field contains the term
searched for.
e.g. If I search for SOLR and if the term exist in field788 for a document,
how do I pinpoint that which field has the term.
I copied all the fields in field called 'body' which makes searching easier
but would be nice to show the field which has that exact term.

thanks

caman wrote:
 
 Hello all,
 
 hope someone can point me to right direction. I am trying to index an
 oracle warehouse table(TableA) with 850 columns. Out of the structure
 about 800 fields are CLOBs and are good candidate to enable full-text
 searching. Also have few columns which has relational link to other
 tables. I am clean on how to create a root entity and then pull data from
 other relational link as child entities.  Most columns in TableA are named
 as field1,field2...field800.
 Now my question is how to organize the schema efficiently: 
 First option:
 if my query is 'select * from TableA', Do I  define field name=attr1
 column=FIELD1 / for each of those 800 columns?   Seems cumbersome. May
 be can write a script to generate XML instead of handwriting both in
 data-config.xml and schema.xml. 
 OR
 Dont define any field name=attr1 column=FIELD1 / so that column in
 SOLR will be same as in the database table. But questions are 1)How do I
 define unique field in this scenario? 2) How to copy all the text fields
 to a common field for easy searching? 
 
 Any helpful is appreciated. Please feel free to suggest any alternative
 way.
 
 Thanks
 
 
 
 
 
 

-- 
View this message in context: 
http://old.nabble.com/Indexing-an-oracle-warehouse-table-tp27414263p27429352.html
Sent from the Solr - User mailing list archive at Nabble.com.



Indexing a oracle warehouse table

2010-02-01 Thread caman

Hello all,

hope someone can point me to right direction. I am trying to index an oracle
warehouse table(TableA) with 850 columns. Out of the structure about 800
fields are CLOBs and are good candidate to enable full-text searching. Also
have few columns which has relational link to other tables. I am clean on
how to create a root entity and then pull data from other relational link as
child entities.  Most columns in TableA are named as
field1,field2...field800.
Now my question is how to organize the schema efficiently: 
First option:
if my query is 'select * from TableA', Do I  define field name=attr1
column=FIELD1 / for each of those 800 columns?   Seems cumbersome. May be
can write a script to generate XML instead of handwriting both in
data-config.xml and schema.xml. 
OR
Dont define any field name=attr1 column=FIELD1 / so that column in
SOLR will be same as in the database table. But questions are 1)How do I
define unique field in this scenario? 2) How to copy all the text fields to
a common field for easy searching? 

Any helpful is appreciated. Please feel free to suggest any alternative way.

Thanks





-- 
View this message in context: 
http://old.nabble.com/Indexing-a-oracle-warehouse-table-tp27414263p27414263.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Document model suggestion

2009-12-21 Thread caman

Lance,
Makes sense. We are playing around with keeping the security model
completely out of Index. We will filter out results before data display
based on access rights. But approach you suggested is not ruled out
completely.
thanks

Lance Norskog-2 wrote:
 
 Yes, you would have 'role' as a multi-valued field. When you add
 someone to a role, you don't have to re-index. That's all.
 
 On Thu, Dec 17, 2009 at 12:55 PM, caman aboxfortheotherst...@gmail.com
 wrote:

 Are you suggesting that roles should be maintained in the index? We do
 manage
 out authentication based on roles but at granular level, user rights play
 a
 big role as well.
 I know we need to compromise, just need to find a balance.

 Thanks


 Lance Norskog-2 wrote:

 Role-based authentication is one level of sophistication up from
 user-based authentication. Users can have different roles, and
 authentication goes against roles. Documents with multiple viewers
 would be assigned special roles. All users would also have their own
 matching role.

 On Tue, Dec 15, 2009 at 10:01 AM, caman aboxfortheotherst...@gmail.com
 wrote:

 Erick,
 I know what you mean.
 Wonder if it is actually cleaner to keep the authorization  model out
 of
 solr index and filter the data at client side based on the user access
 rights.
 Thanks all for help.



 Erick Erickson wrote:

 Yes, that should work. One hard part is what happens if your
 authorization model has groups, especially when membership
 in those groups changes. Then you have to go in and update
 all the affected docs.

 FWIW
 Erick

 On Tue, Dec 15, 2009 at 12:24 PM, caman
 aboxfortheotherst...@gmail.comwrote:


 Shalin,

 Thanks. much appreciated.
 Question about:
  That is usually what people do. The hard part is when some
 documents
 are
 shared across multiple users. 

 What do you recommend when documents has to be shared across multiple
 users?
 Can't I just multivalue a field with all the users who has access to
 the
 document?


 thanks

 Shalin Shekhar Mangar wrote:
 
  On Tue, Dec 15, 2009 at 7:26 AM, caman
  aboxfortheotherst...@gmail.comwrote:
 
 
  Appreciate any guidance here please. Have a master-child table
 between
  two
  tables 'TA' and 'TB' where form is the master table. Any row in TA
 can
  have
  multiple row in TB.
  e.g. row in TA
 
  id---name
  1---tweets
 
  TB:
  id|ta_id|field0|field1|field2.|field20|created_by
  1|1|value1|value2|value2.|value20|User1
 
  snip/
 
 
  This works fine and index the data.But all the data for a row in
 TA
 gets
  combined in one document(not desirable).
  I am not clear on how to
 
  1) separate a particular row from the search results.
  e.g. If I search for 'Android' and there are 5 rows for android in
 TB
 for
  a
  particular instance in TA, would like to show them separately to
 user
 and
  if
  the user click on any of the row,point them to an attached URL in
 the
  application. Should a separate index be maintained for each row in
 TB?TB
  can
  have millions of rows.
 
 
  The easy answer is that whatever you want to show as results should
 be
 the
  thing that you index as documents. So if you want to show tweets as
  results,
  one document should represent one tweet.
 
  Solr is different from relational databases and you should not
 think
 about
  both the same way. De-normalization is the way to go in Solr.
 
 
  2) How to protect one user's data from another user. I guess I can
 keep
 a
  column for a user_id in the schema and append that filter
 automatically
  when
  I search through SOLR. Any better alternatives?
 
 
  That is usually what people do. The hard part is when some
 documents
 are
  shared across multiple users.
 
 
  Bear with me if these are newbie questions please, this is my
 first
 day
  with
  SOLR.
 
 
  No problem. Welcome to Solr!
 
  --
  Regards,
  Shalin Shekhar Mangar.
 
 

 --
 View this message in context:
 http://old.nabble.com/Document-model-suggestion-tp26784346p26798445.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 View this message in context:
 http://old.nabble.com/Document-model-suggestion-tp26784346p26799016.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 Lance Norskog
 goks...@gmail.com



 --
 View this message in context:
 http://old.nabble.com/Document-model-suggestion-tp26784346p26834798.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 -- 
 Lance Norskog
 goks...@gmail.com
 
 

-- 
View this message in context: 
http://old.nabble.com/Document-model-suggestion-tp26784346p26881664.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Document model suggestion

2009-12-17 Thread caman

Are you suggesting that roles should be maintained in the index? We do manage
out authentication based on roles but at granular level, user rights play a
big role as well.
I know we need to compromise, just need to find a balance.

Thanks


Lance Norskog-2 wrote:
 
 Role-based authentication is one level of sophistication up from
 user-based authentication. Users can have different roles, and
 authentication goes against roles. Documents with multiple viewers
 would be assigned special roles. All users would also have their own
 matching role.
 
 On Tue, Dec 15, 2009 at 10:01 AM, caman aboxfortheotherst...@gmail.com
 wrote:

 Erick,
 I know what you mean.
 Wonder if it is actually cleaner to keep the authorization  model out of
 solr index and filter the data at client side based on the user access
 rights.
 Thanks all for help.



 Erick Erickson wrote:

 Yes, that should work. One hard part is what happens if your
 authorization model has groups, especially when membership
 in those groups changes. Then you have to go in and update
 all the affected docs.

 FWIW
 Erick

 On Tue, Dec 15, 2009 at 12:24 PM, caman
 aboxfortheotherst...@gmail.comwrote:


 Shalin,

 Thanks. much appreciated.
 Question about:
  That is usually what people do. The hard part is when some documents
 are
 shared across multiple users. 

 What do you recommend when documents has to be shared across multiple
 users?
 Can't I just multivalue a field with all the users who has access to
 the
 document?


 thanks

 Shalin Shekhar Mangar wrote:
 
  On Tue, Dec 15, 2009 at 7:26 AM, caman
  aboxfortheotherst...@gmail.comwrote:
 
 
  Appreciate any guidance here please. Have a master-child table
 between
  two
  tables 'TA' and 'TB' where form is the master table. Any row in TA
 can
  have
  multiple row in TB.
  e.g. row in TA
 
  id---name
  1---tweets
 
  TB:
  id|ta_id|field0|field1|field2.|field20|created_by
  1|1|value1|value2|value2.|value20|User1
 
  snip/
 
 
  This works fine and index the data.But all the data for a row in TA
 gets
  combined in one document(not desirable).
  I am not clear on how to
 
  1) separate a particular row from the search results.
  e.g. If I search for 'Android' and there are 5 rows for android in
 TB
 for
  a
  particular instance in TA, would like to show them separately to
 user
 and
  if
  the user click on any of the row,point them to an attached URL in
 the
  application. Should a separate index be maintained for each row in
 TB?TB
  can
  have millions of rows.
 
 
  The easy answer is that whatever you want to show as results should
 be
 the
  thing that you index as documents. So if you want to show tweets as
  results,
  one document should represent one tweet.
 
  Solr is different from relational databases and you should not think
 about
  both the same way. De-normalization is the way to go in Solr.
 
 
  2) How to protect one user's data from another user. I guess I can
 keep
 a
  column for a user_id in the schema and append that filter
 automatically
  when
  I search through SOLR. Any better alternatives?
 
 
  That is usually what people do. The hard part is when some documents
 are
  shared across multiple users.
 
 
  Bear with me if these are newbie questions please, this is my first
 day
  with
  SOLR.
 
 
  No problem. Welcome to Solr!
 
  --
  Regards,
  Shalin Shekhar Mangar.
 
 

 --
 View this message in context:
 http://old.nabble.com/Document-model-suggestion-tp26784346p26798445.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 View this message in context:
 http://old.nabble.com/Document-model-suggestion-tp26784346p26799016.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 -- 
 Lance Norskog
 goks...@gmail.com
 
 

-- 
View this message in context: 
http://old.nabble.com/Document-model-suggestion-tp26784346p26834798.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Document model suggestion

2009-12-15 Thread caman

Shalin,

Thanks. much appreciated.
Question about: 
 That is usually what people do. The hard part is when some documents are
shared across multiple users. 

What do you recommend when documents has to be shared across multiple users?
Can't I just multivalue a field with all the users who has access to the
document?


thanks

Shalin Shekhar Mangar wrote:
 
 On Tue, Dec 15, 2009 at 7:26 AM, caman
 aboxfortheotherst...@gmail.comwrote:
 

 Appreciate any guidance here please. Have a master-child table between
 two
 tables 'TA' and 'TB' where form is the master table. Any row in TA can
 have
 multiple row in TB.
 e.g. row in TA

 id---name
 1---tweets

 TB:
 id|ta_id|field0|field1|field2.|field20|created_by
 1|1|value1|value2|value2.|value20|User1

 snip/
 

 This works fine and index the data.But all the data for a row in TA gets
 combined in one document(not desirable).
 I am not clear on how to

 1) separate a particular row from the search results.
 e.g. If I search for 'Android' and there are 5 rows for android in TB for
 a
 particular instance in TA, would like to show them separately to user and
 if
 the user click on any of the row,point them to an attached URL in the
 application. Should a separate index be maintained for each row in TB?TB
 can
 have millions of rows.

 
 The easy answer is that whatever you want to show as results should be the
 thing that you index as documents. So if you want to show tweets as
 results,
 one document should represent one tweet.
 
 Solr is different from relational databases and you should not think about
 both the same way. De-normalization is the way to go in Solr.
 
 
 2) How to protect one user's data from another user. I guess I can keep a
 column for a user_id in the schema and append that filter automatically
 when
 I search through SOLR. Any better alternatives?


 That is usually what people do. The hard part is when some documents are
 shared across multiple users.
 
 
 Bear with me if these are newbie questions please, this is my first day
 with
 SOLR.


 No problem. Welcome to Solr!
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://old.nabble.com/Document-model-suggestion-tp26784346p26798445.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Document model suggestion

2009-12-15 Thread caman

Erick,
I know what you mean. 
Wonder if it is actually cleaner to keep the authorization  model out of
solr index and filter the data at client side based on the user access
rights. 
Thanks all for help.



Erick Erickson wrote:
 
 Yes, that should work. One hard part is what happens if your
 authorization model has groups, especially when membership
 in those groups changes. Then you have to go in and update
 all the affected docs.
 
 FWIW
 Erick
 
 On Tue, Dec 15, 2009 at 12:24 PM, caman
 aboxfortheotherst...@gmail.comwrote:
 

 Shalin,

 Thanks. much appreciated.
 Question about:
  That is usually what people do. The hard part is when some documents
 are
 shared across multiple users. 

 What do you recommend when documents has to be shared across multiple
 users?
 Can't I just multivalue a field with all the users who has access to the
 document?


 thanks

 Shalin Shekhar Mangar wrote:
 
  On Tue, Dec 15, 2009 at 7:26 AM, caman
  aboxfortheotherst...@gmail.comwrote:
 
 
  Appreciate any guidance here please. Have a master-child table between
  two
  tables 'TA' and 'TB' where form is the master table. Any row in TA can
  have
  multiple row in TB.
  e.g. row in TA
 
  id---name
  1---tweets
 
  TB:
  id|ta_id|field0|field1|field2.|field20|created_by
  1|1|value1|value2|value2.|value20|User1
 
  snip/
 
 
  This works fine and index the data.But all the data for a row in TA
 gets
  combined in one document(not desirable).
  I am not clear on how to
 
  1) separate a particular row from the search results.
  e.g. If I search for 'Android' and there are 5 rows for android in TB
 for
  a
  particular instance in TA, would like to show them separately to user
 and
  if
  the user click on any of the row,point them to an attached URL in the
  application. Should a separate index be maintained for each row in
 TB?TB
  can
  have millions of rows.
 
 
  The easy answer is that whatever you want to show as results should be
 the
  thing that you index as documents. So if you want to show tweets as
  results,
  one document should represent one tweet.
 
  Solr is different from relational databases and you should not think
 about
  both the same way. De-normalization is the way to go in Solr.
 
 
  2) How to protect one user's data from another user. I guess I can
 keep
 a
  column for a user_id in the schema and append that filter
 automatically
  when
  I search through SOLR. Any better alternatives?
 
 
  That is usually what people do. The hard part is when some documents
 are
  shared across multiple users.
 
 
  Bear with me if these are newbie questions please, this is my first
 day
  with
  SOLR.
 
 
  No problem. Welcome to Solr!
 
  --
  Regards,
  Shalin Shekhar Mangar.
 
 

 --
 View this message in context:
 http://old.nabble.com/Document-model-suggestion-tp26784346p26798445.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://old.nabble.com/Document-model-suggestion-tp26784346p26799016.html
Sent from the Solr - User mailing list archive at Nabble.com.



Document model suggestion

2009-12-14 Thread caman

Appreciate any guidance here please. Have a master-child table between two
tables 'TA' and 'TB' where form is the master table. Any row in TA can have
multiple row in TB.
e.g. row in TA 

id---name
1---tweets

TB:
id|ta_id|field0|field1|field2.|field20|created_by
1|1|value1|value2|value2.|value20|User1

This is how I am trying to model this in SOLR
document
entity name=TA query=select * from TA
deltaQuery=select id from TA where (last_updated 
'${dataimporter.last_index_time}' or date_created 
'${dataimporter.last_index_time}')  deltaImportQuery=select * from TA
where ID='${dataimporter.delta.id}'
field column=name name=name /
field column=name name=nameSort /
field column=name name=alphaNameSort /

entity name=TB  
query=select
id,field0,field1,field2,field3,field4,ta_id from TB where ta_id='${TA.id}'
deltaQuery=select ta_id from TB where (last_updated 
'${dataimporter.last_index_time}' or date_created 
'${dataimporter.last_index_time}')
parentDeltaQuery=select id from TA where
id=${TB.ta_id} 
field name=dataId column=id / 
 field name=attr0 column=field0 /
field name=attr1 column=field1 /
field name=attr2 column=field2 /
field name=attr3 column=field3 /
field name=attr4 column=field4 /
/entity

/entity
/document

This works fine and index the data.But all the data for a row in TA gets
combined in one document(not desirable).
I am not clear on how to 

1) separate a particular row from the search results. 
e.g. If I search for 'Android' and there are 5 rows for android in TB for a
particular instance in TA, would like to show them separately to user and if
the user click on any of the row,point them to an attached URL in the
application. Should a separate index be maintained for each row in TB?TB can
have millions of rows.
2) How to protect one user's data from another user. I guess I can keep a
column for a user_id in the schema and append that filter automatically when
I search through SOLR. Any better alternatives? 

Bear with me if these are newbie questions please, this is my first day with
SOLR.


Thanks

-- 
View this message in context: 
http://old.nabble.com/Document-model-suggestion-tp26784346p26784346.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: An issue with commit/ using Solr Cell and multiple files

2009-09-10 Thread caman

You are right. 
I got into same thing. Windows curl gave me error but cygwin ran without any
issues.

thanks


Lance Norskog-2 wrote:
 
 It is a windows problem (or curl, whatever).  This works with
 double-quotes.
 
 C:\Users\work\Downloads\cygwin\home\work\curl-7.19.4\curl.exe
 http://localhost:8983/solr/update --data-binary commit/ -H
 Content-type:text/xml; charset=utf-8
 Single-quotes inside double-quotes should work: commit
 waitFlush='false'/
 
 
 On Tue, Sep 8, 2009 at 11:59 AM, caman
 aboxfortheotherst...@gmail.comwrote:
 

 seems to be an error with curl




 Kevin Miller-17 wrote:
 
  I am getting the same error message.  I am running Solr on a Windows
  machine.  Is the commit command a curl command or is it a Solr command?
 
 
  Kevin Miller
  Web Services
 
  -Original Message-
  From: Grant Ingersoll [mailto:gsing...@apache.org]
  Sent: Tuesday, September 08, 2009 12:52 PM
  To: solr-user@lucene.apache.org
  Subject: Re: An issue with commit/ using Solr Cell and multiple files
 
  solr/examples/exampledocs/post.sh does:
  curl $URL --data-binary 'commit/' -H 'Content-type:text/xml;
  charset=utf-8'
 
  Not sure if that helps or how it compares to the book.
 
  On Sep 8, 2009, at 1:48 PM, Kevin Miller wrote:
 
  I am using the Solr nightly build from 8/11/2009.  I am able to index
  my documents using the Solr Cell but when I attempt to send the commit
 
  command I get an error.  I am using the example found in the Solr 1.4
  Enterprise Search Server book (recently released) found on page 84.
  It
  shows to commit the changes as follows (I am showing where my files
  are located not the example in the book):
 
  c:\curl\bin\curl http://echo12:8983/solr/update/ -H Content-Type:
  text/xml --data-binary 'commit waitFlush=false/'
 
  this give me this error: The system cannot find the file specified.
 
  I get the same error when I modify it to look like the following:
 
  c:\curl\bin\curl http://echo12:8983/solr/update/ 'commit
  waitFlush=false/'
  c:\curl\bin\curl http://echo12:8983/solr/update/; -H Content-Type:
  text/xml --data-binary 'commit waitFlush=false/'
  c:\curl\bin\curl http://echo12:8983/solr/update/ 'commit /'
  c:\curl\bin\curl http://echo12:8983/solr/update/; 'commit /'
 
  I am using the example configuration in Solr so my documents are found
 
  in the exampledocs folder also my curl program in located in the root
  directory which is the reason for the way the curl command is being
  executed.
 
  I would appreciate any information on where to look or how to get the
  commit command to execute after indexing multiple files.
 
  Kevin Miller
  Oklahoma Tax Commission
  Web Services
 
  --
  Grant Ingersoll
  http://www.lucidimagination.com/
 
  Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
  using Solr/Lucene:
  http://www.lucidimagination.com/search
 
 
 

 --
 View this message in context:
 http://www.nabble.com/An-issue-with-%3Ccommit-%3E-using-Solr-Cell-and-multiple-files-tp25350995p25352122.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 -- 
 Lance Norskog
 goks...@gmail.com
 
 

-- 
View this message in context: 
http://www.nabble.com/An-issue-with-%3Ccommit-%3E-using-Solr-Cell-and-multiple-files-tp25350995p25394203.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: An issue with commit/ using Solr Cell and multiple files

2009-09-08 Thread caman

seems to be an error with curl




Kevin Miller-17 wrote:
 
 I am getting the same error message.  I am running Solr on a Windows
 machine.  Is the commit command a curl command or is it a Solr command? 
 
 
 Kevin Miller
 Web Services
 
 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org] 
 Sent: Tuesday, September 08, 2009 12:52 PM
 To: solr-user@lucene.apache.org
 Subject: Re: An issue with commit/ using Solr Cell and multiple files
 
 solr/examples/exampledocs/post.sh does:
 curl $URL --data-binary 'commit/' -H 'Content-type:text/xml;
 charset=utf-8'
 
 Not sure if that helps or how it compares to the book.
 
 On Sep 8, 2009, at 1:48 PM, Kevin Miller wrote:
 
 I am using the Solr nightly build from 8/11/2009.  I am able to index 
 my documents using the Solr Cell but when I attempt to send the commit
 
 command I get an error.  I am using the example found in the Solr 1.4
 Enterprise Search Server book (recently released) found on page 84.   
 It
 shows to commit the changes as follows (I am showing where my files 
 are located not the example in the book):

 c:\curl\bin\curl http://echo12:8983/solr/update/ -H Content-Type:
 text/xml --data-binary 'commit waitFlush=false/'

 this give me this error: The system cannot find the file specified.

 I get the same error when I modify it to look like the following:

 c:\curl\bin\curl http://echo12:8983/solr/update/ 'commit
 waitFlush=false/'
 c:\curl\bin\curl http://echo12:8983/solr/update/; -H Content-Type:
 text/xml --data-binary 'commit waitFlush=false/'
 c:\curl\bin\curl http://echo12:8983/solr/update/ 'commit /'
 c:\curl\bin\curl http://echo12:8983/solr/update/; 'commit /'

 I am using the example configuration in Solr so my documents are found
 
 in the exampledocs folder also my curl program in located in the root 
 directory which is the reason for the way the curl command is being 
 executed.

 I would appreciate any information on where to look or how to get the 
 commit command to execute after indexing multiple files.

 Kevin Miller
 Oklahoma Tax Commission
 Web Services
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
 using Solr/Lucene:
 http://www.lucidimagination.com/search
 
 
 

-- 
View this message in context: 
http://www.nabble.com/An-issue-with-%3Ccommit-%3E-using-Solr-Cell-and-multiple-files-tp25350995p25352122.html
Sent from the Solr - User mailing list archive at Nabble.com.