terms component misleading results

2012-05-25 Thread Cam Bazz
Hello,

I need to know exact count of certain terms in the documents. I
noticed that when I update a document, (only one field for testing)
the terms count go +1 for that specific term. for example, if I have
two documents in index, each with tag=ccc and if I update one of the
documents, the terms frequency for ccc becomes 3. when I optimize the
index, it goes down again to correct number. (2)

Is there any way to get the exact term frequency?

Regular querying works well, but i quite did not understand why the
terms count is misleading.

Best Regards,
C.B.


upgrade to 3.6

2012-05-25 Thread Cam Bazz
Hello,

I have upgraded from 1.4 to 3.6 - it went quite smooth, using the same
schema.xml

I have done some testing, and I have not found any problems yet. Soon
I will migrate the production system to 3.6

Any recomendations on this matter? Maybe I skipped something?

Best Regards,
C.B.


Re: upgrade to 3.6

2012-05-25 Thread Cam Bazz
Hello,

I have tested, but was not able to replicate the problem.

(basically i indexed few documents with utf8 chars, and then searched
for them, and found ok)

On the issues at 27/Apr/12 08:56

 the fix is now committed to 3.6 branch

I just recently downloaded the 3.6 - well actually it seems I
downloaded it at  2012-04-27 19:27 GMT+2 (from file stamp)

Does that mean that I was lucky?

Best,


On Fri, May 25, 2012 at 10:17 AM, Sami Siren ssi...@gmail.com wrote:
 Hi,

 If you're using non ascii data with solrj you might want to test that
 it works for you properly. See for example
 https://issues.apache.org/jira/browse/SOLR-3375

 --
  Sami Siren

 On Fri, May 25, 2012 at 10:11 AM, Cam Bazz camb...@gmail.com wrote:
 Hello,

 I have upgraded from 1.4 to 3.6 - it went quite smooth, using the same
 schema.xml

 I have done some testing, and I have not found any problems yet. Soon
 I will migrate the production system to 3.6

 Any recomendations on this matter? Maybe I skipped something?

 Best Regards,
 C.B.


Re: terms component misleading results

2012-05-25 Thread Cam Bazz
Oh ok, I got it.

So If I update the document three times, does that mean I have 1
normal document, and 2 marked for deletion?

Because the max difference was 1 - no matter how many times you update.

I think I can manage the faceting to do what I need. I guess that will
be faster than making a real query, and extracting the full docs.

Best Regards,
-C.B.

On Fri, May 25, 2012 at 10:14 AM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : the terms count go +1 for that specific term. for example, if I have
 : two documents in index, each with tag=ccc and if I update one of the
 : documents, the terms frequency for ccc becomes 3. when I optimize the
 : index, it goes down again to correct number. (2)

 http://wiki.apache.org/solr/TermsComponent

 Retrieving terms in index order is very fast since the implementation
 directly uses Lucene's TermEnum to iterate over the term dictionary.
 ...
 The doc frequencies returned are the number of documents that match the
 term, including any documents that have been marked for deletion but
 not yet removed from the index.

 : Is there any way to get the exact term frequency?

 field faceting.


 -Hoss


representing latlontype in pojo

2011-11-08 Thread Cam Bazz
Hello,

I have custom pojo's, and I use solrj to read and index them with
getBeans() method.

So now, I want to store a spatially searchable data member in my pojo.


I have in my schema.xml:

fieldType name=location class=solr.LatLonType
subFieldSuffix=_coordinate/

and


field name=location type=location indexed=true stored=true/

-

so, what object type must I have in my bean? LatLonType does not seem
to have a constructor, or getX, getY methods, and I think it is
internal to solr.

How can I store a 2d point and index it to a field type that is
latlontype, if I am using solrj?

Best Regards,
C.B.


synonyms file, and example cases

2011-01-24 Thread Cam Bazz
Hello,

I have been looking at the solr synonym file that was an example, I
did not understand some notation:

aaa = 

bbb = 1 2

ccc = 1,2

a\=a = b\=b

a\,a = b\,b

fooaaa,baraaa,bazaaa

The first one says search for  when query is aaa. am I correct?
the second one finds 1 2 when query is bbb
the third one is find 1 or 2 when query is ccc

the fourth, and fifth one I have not understood.

the last one, i assume is a group, bidirectional mapping between
fooaaa,baraaa,bazaaa

I am especially interested with this last one, if I do aaa,bbb it will
find aaa and bbb when either aaa or bbb is queryied?

am I correct in those assumptions?

Best regards,
C.B.


making rotating timestamped logs from solr output

2010-07-08 Thread Cam Bazz
Hello,

I would like to log the solr console. although solr logs requests in
timestamped format, this only logs the requests, i.e. does not log
number of hits for a given query, etc.

is there any easy way to do this other then reverting to methods for
capturing solr output. I usually run solr on my server using screen
command first, running solr, then detaching from console.

but it would be nice to have output logging instead of request logging.

best regards,
c.b.


Re: faceting question

2009-01-24 Thread Cam Bazz
is there no other way then to use the patch?

since the query A is super set of B ???

if not doable, I will probably use some caching technique.

Best.

On Sat, Jan 24, 2009 at 9:14 AM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 On Sat, Jan 24, 2009 at 6:56 AM, Cam Bazz camb...@gmail.com wrote:

 Hello;

 I got a multiField named tagList which may contain multiple tags. I am
 making a query like:

 tagList:a AND tagList:b AND tagList:c

 and I am also getting a tagList facet returning me some values.

 What I would like is Solr to return me facets as if the query was:
 tagList:a AND tagList:b

 is it even possible?


 If I understand correctly,
 1. You want to query for tagList:a AND tagList:b AND tagList:c
 2. At the same time, you want to request facets for tagList but only for
 tagList:a and tagList:b

 If that is correct, you can use the features introduced by
 https://issues.apache.org/jira/browse/SOLR-911

 However you may need to put #1 as fq instead of q.
 --
 Regards,
 Shalin Shekhar Mangar.



faceting question

2009-01-23 Thread Cam Bazz
Hello;

I got a multiField named tagList which may contain multiple tags. I am
making a query like:

tagList:a AND tagList:b AND tagList:c

and I am also getting a tagList facet returning me some values.

What I would like is Solr to return me facets as if the query was:
tagList:a AND tagList:b

is it even possible?

Best,
-C.B.


Re: feeding data

2008-10-09 Thread Cam Bazz
Hello Erik,

I am specially interested on how to integrate it to a glassfish/ejb3
environment.

In the past, I have done something like a proxy servlet to forward the
request and get back the request. it is kind of bother some.

also for indexing i need some sort of api access.

Anyone has done integration of solr to a serlvet/ejb3 based system?

Best Regards,
-C.B.


On Thu, Sep 4, 2008 at 3:32 PM, Erik Hatcher [EMAIL PROTECTED] wrote:

 On Sep 4, 2008, at 8:27 AM, Cam Bazz wrote:

 hello,
 is there no other way then making xml files and feeding those to solr?

 I just want to feed solr programmatically. - without xml

 There are several options.  You can feed Solr XML, or CSV, or use any of the
 Solr client APIs (though those use XML under the covers for indexing
 documents, but transparently).  A more advanced option is to use Solr in
 embedded mode where you use its Java API directly with no intermediate
 representation needed.

Erik




feeding documents tru API

2008-10-09 Thread Cam Bazz
Hello,

I have been looking at the API documentation but I dont know where to
look in order to feed documents tru API without using xml files.

any ideas?

Best.
-C.B.


feeding data

2008-09-04 Thread Cam Bazz
hello,
is there no other way then making xml files and feeding those to solr?

I just want to feed solr programmatically. - without xml

Best.


Re: adding documents with json post

2008-06-23 Thread Cam Bazz
thanks a bunch.

On Mon, Jun 23, 2008 at 4:39 AM, Otis Gospodnetic 
[EMAIL PROTECTED] wrote:

 Hi Cam,

 Yes, the various other formats are for responses only, as far as I'm aware.

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


 - Original Message 
  From: Cam Bazz [EMAIL PROTECTED]
  To: solr-user@lucene.apache.org
  Sent: Sunday, June 22, 2008 5:53:56 PM
  Subject: adding documents with json post
 
  hello;
 
  this probably has been asked before but how can I add documents with tru
 a
  json ajax submit?
 
  does solr only accept XML input?
 
  Best.
  -C.B.




html to text based on some sort of uniqueness metric

2008-06-09 Thread Cam Bazz
Hello,

I am indexing newspaper articles as an excercise in solr. When dealing with
newspaper articles in previous experiences I always tried to get the div or
the table that contains the actual news, using nekohtml traversing tru the
dom tree and getting the text from the div or table that contains the
article. When dealing with many newspapers, it is a hassle to custom code to
extract relevant information. There is usually a lot of garbage in the html.
From categories to ads, and further more they change, so a static coding is
problematic.

I have been thinking if I could measure the frequency or uniqueness for each
node, and find the news automatically - but I have not come up with an
implementation.

Has anyone did/contemplated/used something similar? Maybe there is already a
way - using lucene, or even hadoop.

Best Regards,
-C.A.


Re: Solr system and numbers

2008-06-09 Thread Cam Bazz
I got a similar question:
how would one normalize or even detect if a string is a phone number?

On Mon, Jun 9, 2008 at 4:17 PM, dudes dudes [EMAIL PROTECTED] wrote:


 great info ,,, thanks a lot all


 
  Date: Mon, 9 Jun 2008 05:58:50 -0700
  From: [EMAIL PROTECTED]
  Subject: Re: Solr system and numbers
  To: solr-user@lucene.apache.org
 
  Hi,
  Solr/Lucene can treat phone numbers as strings.  If you want to clean
 them up and normalize them outside of Solr, you can do that and feed them
 into Solr as pure numbers.
 
  How the phone numbers will be treated after you pump them into Solr
 depends on the analyzer you choose to use for this data.  If you don't need
 to search on subsets of phone numbers, then just don't tokenize them (i.e.
 use string type if the phone numbers contain any non-numeric characters,
 sint otherwise).
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
  - Original Message 
  From: dudes dudes
  To: solr-user@lucene.apache.org
  Sent: Monday, June 9, 2008 2:10:20 PM
  Subject: Solr system and numbers
 
 
  Hello experts,
 
  How does Solr deal with numbers or phone numbers .. For example if you
 have 1234
  and 12 34 or 1 234... with spaces between the numbers ..
  Or this is dealt by lucene ?
 
  any documentations or tutorial on this ?
 
  many thanks,
  ak
  _
 
  All new Live Search at Live.com
 
  http://clk.atdmt.com/UKM/go/msnnkmgl001006ukm/direct/01/
 

 _

 All new Live Search at Live.com

 http://clk.atdmt.com/UKM/go/msnnkmgl001006ukm/direct/01/



solr query syntax

2008-06-05 Thread Cam Bazz
Hello,

how can we specify in query so it will just bring certain field and query in
the default field?

for example can I do a

year:1998 AND searchword

Best Regards,
-C.B.


Re: Announcement of Solr Javascript Client

2008-05-29 Thread Cam Bazz
I have done something similar and I am using a search servlet that will
forward the request to solr tru commons htclient.

Maybe it could be a solution to DoS, although it is still possible.

Best.

-Cam Bazz

On Thu, May 29, 2008 at 8:04 PM, Otis Gospodnetic 
[EMAIL PROTECTED] wrote:

 I just had a look at the demo and reeeally like it!

 I didn't pay enough attention to this thread, though.  Is the main concern
 that by having a Solr search webapp that is really all in UI and uses your
 JS library, the backend Solr server is directly exposed and thus somebody
 could peek in the web page source, figure out Solr's address, and start
 issuing delete and other damaging requests?

 I think somebody mentioned a Servlet Filter.  Couldn't we simply supply a
 servlet filter that allows only some request URLs, possibly reading those
 URLs from an external file, thus allowing easy customization?


 This dynamic stuff looks vry juicy.

 Question about scalability:
 How much is cached either client-side?  With every new letter I type, is JS
 hitting Solr, or is there some caching (planned) on the client?

 Danke,
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


 - Original Message 
  From: Matthew Runo [EMAIL PROTECTED]
  To: solr-user@lucene.apache.org
  Sent: Thursday, May 29, 2008 12:50:25 PM
  Subject: Re: Announcement of Solr Javascript Client
 
  Wow. This is really pretty cool. You're much further along than I
  thought you were! I'd love to see this in as an 'official' Solr client.
 
  Thanks!
 
  Matthew Runo
  Software Developer
  Zappos.com
  702.943.7833
 
  On May 29, 2008, at 8:15 AM, Matthias Epheser wrote:
 
   The server was rebooted yesterday without my knowledge, so the jetty
   is restarted and should be reachable at
  http://lovo.test.dev.indoqa.com/mepheser/moobrowser/
  
   As you can see, this first demo uses widget classes and is built
   with mootools.




solr feed problem

2008-05-18 Thread Cam Bazz
hello,

I am trying to feed solr with xml files of my own schema, and I am getting:

SEVERE: org.xmlpull.v1.XmlPullParserException: entity reference names can
not start with character '\ufffd'

my xml is utf8 for sure, as well as the text inside. but for some reason I
get this exception and then solr crashes.

Any ideas?

Best Regards,
-C.B.


exception while feeding converted text from pdf

2008-05-14 Thread Cam Bazz
Hello,

I made a simple java program to convert my pdfs to text, and then to xml
file.
I am getting a strange exception. I think the converted files have some
errors. should I encode the txt string that I extract from the pdfs in a
special way?

Best,
-C.B.

EVERE: org.xmlpull.v1.XmlPullParserException: entity reference names can not
start with character ' ' (position: START_TAG seen
...ay\n  latitude 59 ...
@80:64)
at org.xmlpull.mxp1.MXParser.parseEntityRef(MXParser.java:2212)
at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1275)
at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)
at org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1058)
at
org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:332)
at
org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:162)
at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)


Re: indexing pdf documents

2008-05-13 Thread Cam Bazz
yes, I have seen the documentation on RichDocumentRequestHandler at the
http://wiki.apache.org/solr/UpdateRichDocuments page.
However, from what I understand this just feeds documents to solr. How can I
construct something like: document_id, document_name, document_text and feed
it in. (i.e. my documents have labels)

Best.
-C.B.

On Tue, May 13, 2008 at 1:30 AM, Chris Harris [EMAIL PROTECTED] wrote:

 Solr does not have this support built in, but there's a patch for it:

 https://issues.apache.org/jira/browse/SOLR-284

 On Mon, May 12, 2008 at 2:02 PM, Cam Bazz [EMAIL PROTECTED] wrote:
  Hello,
 
   Before making a little program to extract the txt from my pdfs and feed
 it
   into solr with xml, I just wanted to check if solr has capability to
 digest
   pdf files apart from xml?
 
   Best Regards,
   -C.B.
 



indexing pdf documents

2008-05-12 Thread Cam Bazz
Hello,

Before making a little program to extract the txt from my pdfs and feed it
into solr with xml, I just wanted to check if solr has capability to digest
pdf files apart from xml?

Best Regards,
-C.B.