indexing unique keys

2014-09-04 Thread Mark , N
I have a use-case where we want to store unique keys ( Hashes)  which would
be
used to compare against another set of  keys ( Hashes)

For example

 Index  set= { h1, h2 , h3 , h4 }

comparision set = { h1 , h2 }

result set = h1,h2

Would it be an advantage to store index set in  Solr instead of storing
in traditional databases?

Thanks in advance






*Nipen Mark *


search hit on multivalued fields

2012-08-03 Thread Mark , N
I have a multivalued field  Tex which is indexed , for example :

F1:  some value
F2: some value
Text = ( content of f1,f2)

When user search , I am checking only a  Text field but i would also need
to display to users which Field ( F1 or F2 )  resulted the search hit
Is it possible in SOLR  ?


-- 
Thanks,

*Nipen Mark *


Re: filtering number and repeated contents

2012-06-07 Thread Mark , N
thanks Jack  , I will try updateProcessor

Between does SOLR store tokenized content in fields if field have
property stored=true ?







On Tue, Jun 5, 2012 at 8:23 PM, Jack Krupansky j...@basetechnology.comwrote:

 My (very limited) understanding of boilerpipe in Tika is that it strips
 out short text, which is great for all the menu and navigation text, but
 the typical disclaimer at the bottom of an email is not very short and
 frequently can be longer than the email message body itself. You may have
 to resort to a custom update processor that is programmed with some
 disclaimer signature text strings to be removed from field values.

 -- Jack Krupansky

 -Original Message- From: Mark , N
 Sent: Tuesday, June 05, 2012 8:28 AM
 To: solr-user@lucene.apache.org
 Subject: filtering number and repeated contents


 Is it possible to filter out numbers and disclaimer ( repeated contents)
 while indexing to SOLR?
 These are all surplus information and do not want to index it

 I have tried using boilerpipe algorithm as well to remove surplus
 infromation from web pages such as navigational elements, templates, and
 advertisements , I think it works well but looking forward to see If I
 could filter out  disclaimer information too mainly in email texts.
 --
 Thanks,

 *Nipen Mark *




-- 
Thanks,

*Nipen Mark *


filtering number and repeated contents

2012-06-05 Thread Mark , N
Is it possible to filter out numbers and disclaimer ( repeated contents)
while indexing to SOLR?
These are all surplus information and do not want to index it

I have tried using boilerpipe algorithm as well to remove surplus
infromation from web pages such as navigational elements, templates, and
advertisements , I think it works well but looking forward to see If I
could filter out  disclaimer information too mainly in email texts.
-- 
Thanks,

*Nipen Mark *


filtering footer information

2012-05-23 Thread Mark , N
Is it possible to filter certain repeated  footer information from text
documents while indexing to solr ?

Are there any built-in filters similar to stop word filters ?




-- 
Thanks,

*Nipen Mark *


Re: wildcard and proximity searches

2010-10-05 Thread Mark N
Hi

were you successful in trying SOLR -1604  to allow wild card queries in
phrases ?

Also does this plugin allow us to use proximity with wild card
*  solr mail*~10 *

If this the right approach to go ahead to support these functionalities?

thanks
Mark





On Wed, Aug 4, 2010 at 2:24 PM, Frederico Azeiteiro 
frederico.azeite...@cision.com wrote:

 Thanks for you ideia.

 At this point I'm logging each query time. My ideia is to divide my
 queries into normal queries and heavy queries. I have some heavy
 queries with 1 minute or 2mintes to get results. But they have for
 instance (*word1* AND *word2* AND word3*). I guess that this will be
 always slower (could be a little faster with
 ReversedWildcardFilterFactory) but they never be ready in a few
 seconds. For now, I just increased the timeout for those :) (using
 solrnet).

 My priority at the moment is the queries phrases like word1* word2*
 word3. After this is working, I'll try to optimize the heavy queries

 Frederico


 -Original Message-
 From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
 Sent: quarta-feira, 4 de Agosto de 2010 01:41
 To: solr-user@lucene.apache.org
 Subject: Re: wildcard and proximity searches

 Frederico Azeiteiro wrote:
 
  But it is unusual to use both leading and trailing * operator. Why
 are
 
  you doing this?
 
  Yes I know, but I have a few queries that need this. I'll try the
  ReversedWildcardFilterFactory.
 
 
 

 ReverseWildcardFilter will help leading wildcard, but will not help
 trying to use a query with BOTH leading and trailing wildcard. it'll
 still be slow. Solr/lucene isn't good at that; I didn't even know Solr
 would do it at all in fact.

 If you really needed to do that, the way to play to solr/lucene's way of

 doing things, would be to have a field where you actually index each
 _character_ as a seperate token. Then leading and trailing wildcard
 search is basically reduced to a phrase search, but where the words
 are actually characters.   But then you're going to get an index where
 pretty much every token belongs to every document, which Solr isn't that

 great at either, but then you can apply commongram stuff on top to
 help that out a lot too. Not quite sure what the end result will be,
 I've never tried it.  I'd only use that weird special char as token
 field for queries that actually required leading and trailing wildcards.

 Figuring out how to set up your analyzers, and what (if anything) you're

 going to have to do client-app-side to transform the user's query into
 something that'll end up searching like a phrase search where each
 'word' is a character is left as an exersize for the reader. :)

 Jonathan




-- 
Nipen Mark


Re: wildcard and proximity searches

2010-10-05 Thread Mark N
Thanks ahmet

Is it also possible to search the document having a  field ENDING with
week*

query should return documents with a field ending with  week and its
derivatives such as weekly,weeks

So above query should return

this week
Past three weeks
Report weekly

thanks
chandan



On Tue, Oct 5, 2010 at 5:04 PM, Ahmet Arslan iori...@yahoo.com wrote:

  Also does this plugin allow us to use proximity with wild
  card
  *  solr mail*~10 *
 

 Yes it supports solr mail*~10 kind of queries without any problem.

 Currently it throws exception with mail* kind of queries, but they are
 not valid phrase queries. Because there is only one clause inside quotation
 marks.






-- 
Nipen Mark


Re: question on wild card

2010-07-15 Thread Mark N
thanks erick .

One more question

 when the perfect world* is passed as search query its converted as   ?
perfect world  what does ? mean

 Since i am using standard analyzer  i thought  stop word the is removed

thanks


On Thu, Jul 15, 2010 at 7:01 AM, Erick Erickson erickerick...@gmail.comwrote:

 The best way to understand how things are parsed is to go to the solr admin
 page (Full interface link?) and click the debug info box and submit your
 query. That'll tell you exactly what happens.

 Alternatively, you can put debugQuery=on on your URL...

 HTH
 Erick

 On Wed, Jul 14, 2010 at 8:48 AM, Mark N nipen.m...@gmail.com wrote:

  I have a database field  = hello world and i am indexing to *text* field
  with standard analyzer ( text is a copy field of solr)
 
  Now when user  gives a query   text:hello world%  , how does the query
 is
  interpreted in the background
 
  are we actually searchingtext: hello OR  text: world%( consider
 by
  default operator is OR )
 
 
 
 
 
 
  --
  Nipen Mark
 




-- 
Nipen Mark


question on wild card

2010-07-14 Thread Mark N
I have a database field  = hello world and i am indexing to *text* field
with standard analyzer ( text is a copy field of solr)

Now when user  gives a query   text:hello world%  , how does the query is
interpreted in the background

are we actually searchingtext: hello OR  text: world%( consider by
default operator is OR )






-- 
Nipen Mark


Two analyzer per field

2010-07-12 Thread Mark N
Is it possible to specify two analyzers per fields

for example , consider a field  *F1  *( keyword analyzer) = cheers mate
*F2 *(keyword analyzer ) =
hello world

There is also a copy field  *TEXT *( standard analyzer )   which will store
the  terms  { cheers mate hello world }

now when user perform any search we will be looking at copy field TEXT
only which uses standard analyzer . Suppose user search hello word  phrase
it will not return any result
as hello and world terms are tokenized .

is it possible that I index hello world as it is as well in to
*TEXT*field ? i.e can I use keyword analyzer as well and standard
analyzer for
field TEXT
what should be better approach to handle this situation ?





-- 
Nipen Mark


Solr DataImportHandler

2010-04-08 Thread Mark N
Is it possible to use solr DataImportHandler when that database fields are
not fixed ?  As per my findings we need to configure which table ( entity)
we will read the data and must match which fields in database will map to
fields in solr schema

Since in my case database fields could be dynamic , can DIH be helpful ?

please suggest


-- 
Nipen Mark


indexing a huge data

2010-03-05 Thread Mark N
what should be the fastest way to index a documents , I am indexing huge
collection of data after extracting certain meta - data information
for example author and filename of each files

i am extracting these information and storing in XML format

for example :   fileid 1fileidauthorabc /author
filenameabc.doc/filename
 fileid 2fileidauthorabc /author
filenameabc1.doc/filename

I can not index these documents directly to solr as it is not in the format
required by solr ( i can not change the format as its used in other modules)

should converting these file to CSV will be better and faster approach
compared to XML?



please  suggest




-- 
Nipen Mark


Re: Getting max/min dates from solr index

2010-02-16 Thread Mark N
thanks .
Is it possible to do date faceting on multiple solr shards?

I am using index created in two different shards to do date faceting on
field DATE

*
http://localhost:8983/solr/1_13_1_3/select?shards=localhost:8983/solr/index1/,localhost_two:8983/solr/index/start=0rows=20q=*facet=truefacet.date=DATEfacet.date.start=2004-01-01T00:00:00Zfacet.date.end=2011-01-01T00:00:00Zfacet.date.gap=%2B1YEAR
*




On Fri, Feb 12, 2010 at 3:39 AM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:

 Mark,

 Yes, facets will give you that information. Min/max StatsComponent?
  See http://www.search-lucene.com/?q=StatsComponent

  Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Hadoop ecosystem search :: http://search-hadoop.com/



 - Original Message 
  From: Mark N nipen.m...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Wed, February 10, 2010 8:12:43 AM
  Subject: Getting max/min dates from solr index
 
  How can we get the max and min date from the Solr index ? I would need
 these
  dates to draw a graph ( for example timeline graph )
 
 
  Also can we use date faceting to show how many documents are indexed
 every
  month  .
  Consider I need to draw a timeline graph for current year to show how
 many
  records are indexed for every month  .So i will have months in X axis and
 no
  of document in Y axis.
 
  What should be the better approach to design a schema to achieve this
  functionality ?
 
 
  Any suggestions would be appreciated
 
  thanks
 
 
  --
  Nipen Mark




-- 
Nipen Mark


Getting max/min dates from solr index

2010-02-10 Thread Mark N
How can we get the max and min date from the Solr index ? I would need these
dates to draw a graph ( for example timeline graph )


Also can we use date faceting to show how many documents are indexed every
month  .
Consider I need to draw a timeline graph for current year to show how many
records are indexed for every month  .So i will have months in X axis and no
of document in Y axis.

What should be the better approach to design a schema to achieve this
functionality ?


Any suggestions would be appreciated

thanks


-- 
Nipen Mark


solr updateCSV

2010-01-07 Thread Mark N
I am trying to use solr's csv updater to index the data , i am tryin to
specify the .Dat format consisting of field seperator , text qualifier and a
line seperator

for example

field 1  field separator  field 2field seperator 
text qualifiervalue for field 1text qualifierfield seperatortext
qualifiervalue for field 2 text qualifierfield seperatorline
seperator

Can we specify text qualifier and line seperator as well ?

I have tested that we can specify a seperator and works good.



-- 
Nipen Mark


Indexing large text documents

2010-01-05 Thread Mark N
SolrInputDocument doc1 = new SolrInputDocument();
 doc1.addField( Fulltext, strContent);

strContent is a string variable which  contains  contents of  text file.
( assume that text file is located in c:\files\abc.txt )

In my case abc.text  ( text files ) could be very huge ~ 2 GB so it is not
always possible to read and store them into string variables
before indexing . Can anyone suggest what should be better approach to index
these huge text files ?



-- 
Nipen Mark


Enumerating wildcard terms

2009-12-08 Thread Mark N
Is it possible to  enumerate all terms that match the specified wildcard
filter term.  Similar to Lunce  WildCardTermEnum API

for example if I search abc*   then I just should able to access all the
terms abc1, abc2 , abc3... that exists in Index

What should be better approach to meet this functionality ?




-- 
Nipen Mark


Re: nested solr queries

2009-11-30 Thread Mark N
hi shalin

I am trying to achieve something like JOIN. Previously am doing this with
two queries on solr

solr index  = ( field1 ,field 2, field3)

query1 = (  for  example field1=ABC )

suppose query1 returns results set1= { 1, 2 ,3 ,4 } which matches query1

query2 = (   get all records having field2=xyz for each records  i.e  for
set1= {1,2,3,4} returned by query1 )

Am not sure if I could do something like this using the nested solr query
from link

http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/



thanks


On Mon, Nov 30, 2009 at 1:50 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Mon, Nov 30, 2009 at 1:19 PM, Mark N nipen.m...@gmail.com wrote:

  Is it possible to write nested queries in Solr similar to sql like query
  where  I can take results of the first query and use one or more of its
  fields as an argument in the second query.
 
 
 That sounds like a join. If so, the answer would be no.


 
  For example:
 
  field1:XYZ AND (_query_: field3:{value of field4})
 
  This should search for all types of XYZ and then iterate over the result
  set
  and perform a query for where field3  is equal to the value of field1
 from
  each item of the first result set.
 
 
 Your description is not consistent with the query you have given. If
 field:XYZ is specified, then what are types of XYZ? Also, if you want to
 perform a query where field3 is equal to the value of field1 then, what is
 field4 in the query you have given?


  this is similar to SQL like query
 
 
  select distinct ( fieldA ) from table where fieldA  IN
 

 That sounds similar to faceting. See
 http://wiki.apache.org/solr/SimpleFacetParameters

 Perhaps you can give more details on what you want to achieve.

 --
 Regards,
 Shalin Shekhar Mangar.



Re: nested solr queries

2009-11-30 Thread Mark N
field2=xyz we dont know until we run query1

To simply i was actually trying to do some kind of JOIN similar to following
SQL query


 select  * from table1  where  *field2*  in
 ( select *field2  *from dbo.concept_db where field1='ABC' )

if this is not possible then i will have to search inner query  (
select *field2
*from dbo.concept_db where field1='ABC' )  first and then only  run the
outer query

thanks
chandan




On Mon, Nov 30, 2009 at 2:25 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Mon, Nov 30, 2009 at 2:02 PM, Mark N nipen.m...@gmail.com wrote:

  hi shalin
 
  I am trying to achieve something like JOIN. Previously am doing this with
  two queries on solr
 
  solr index  = ( field1 ,field 2, field3)
 
  query1 = (  for  example field1=ABC )
 
  suppose query1 returns results set1= { 1, 2 ,3 ,4 } which matches query1
 
  query2 = (   get all records having field2=xyz for each records  i.e
  for
  set1= {1,2,3,4} returned by query1 )
 
 
 That sequence of queries will return documents which have field1=ABC and
 field2=xyz. The same result can be obtained in one query with
 q=+field1:ABC +field2:xyz

 Have I misunderstood the problem?


  Am not sure if I could do something like this using the nested solr query
  from link
 
  http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/
 
 
 No, nested queries can only influence scores. They do not filter the
 results.

 --
 Regards,
 Shalin Shekhar Mangar.



Re: nested solr queries

2009-11-30 Thread Mark N
thanks for your help so do you think I should execute solr queries twice ?
or is there any other workarounds




On Mon, Nov 30, 2009 at 3:07 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Mon, Nov 30, 2009 at 2:26 PM, Mark N nipen.m...@gmail.com wrote:

  field2=xyz we dont know until we run query1
 
 
 Ah, ok. I thought xyz was a literal that you wanted to search.


  To simply i was actually trying to do some kind of JOIN similar to
  following
  SQL query
 
 
   select  * from table1  where  *field2*  in
   ( select *field2  *from dbo.concept_db where field1='ABC' )
 
  if this is not possible then i will have to search inner query  (
  select *field2
  *from dbo.concept_db where field1='ABC' )  first and then only  run the
  outer query
 
 
 No, there are no joins in Solr. Consider de-normalizing your schema, if you
 haven't.

 --
 Regards,
 Shalin Shekhar Mangar.




-- 
Nipen Mark


nested solr queries

2009-11-29 Thread Mark N
Is it possible to write nested queries in Solr similar to sql like query
where  I can take results of the first query and use one or more of its
fields as an argument in the second query.


For example:

field1:XYZ AND (_query_: field3:{value of field4})

This should search for all types of XYZ and then iterate over the result set
and perform a query for where field3  is equal to the value of field1 from
each item of the first result set.

this is similar to SQL like query


select distinct ( fieldA ) from table where fieldA  IN