Is it possible to run the phoenix query server on a machine other than the regionservers?

2015-12-17 Thread F21
I have successfully deployed phoenix and the phoenix query server into a 
toy HBase cluster.


I am currently running the http query server on all regionserver, 
however I think it would be much better if I can run the http query 
servers on separate docker containers or machines. This way, I can 
easily scale the number of query servers and put them against a DNS name 
such as phoenix.mycompany.internal.


I've had a look at the configuration, but it seems to be heavily tied to 
HBase. For example, it requires the HBASE_CONF_DIR environment variable 
to be set.


Is this something that's currently possible?


Help calling CsvBulkLoadTool from Java Method

2015-12-17 Thread Cox, Jonathan A
I'm wondering if somebody can provide some guidance on how to use 
CsvBulkLoadTool from within a Java Class, instead of via the command line, as 
is shown in the documentation. I'd like to determine if CsvBulkLoadTool ran 
without throwing any exceptions. However, exceptions generated by 
org.apache.phoenix.* don't seem to be thrown up to the calling method 
(ToolRunner.run), or at least result in a non-zero return code.

Here is what I am doing:
import org.apache.phoenix.mapreduce.CsvBulkLoadTool;
import org.apache.hadoop.util.ToolRunner;

CsvBulkLoadTool csvBulkLoader = new CsvBulkLoadTool();
int tret;
String[] args = {"-d", "\t", "--table", "MyTableName", "--input", 
"file://myfile.tsv"};

tret = ToolRunner.run(csvBulkLoader, args);

When I run this code, I get a return code (tret) of zero and no errors in the 
console output. However, the data is not loaded into HBase. When running the 
same commands on the command line, I discovered that Phoenix can throw various 
errors (e.g. wrong column datatype, permission error, whatever...). But there 
doesn't seem to be a way for me to discover these errors from either exceptions 
that CsvBulkLoadTool throws or via the return code.

What's the best way to determine if CsvBulkLoadTool ran without error?

Thanks,
Jonathan


Re: Is it possible to run the phoenix query server on a machine other than the regionservers?

2015-12-17 Thread F21

Hey Rafa,

So in terms of the hbase-site.xml, I just need the entries for the 
location to the zookeeper quorum and the zookeeper znode for the cluster 
right?


Cheers!

On 17/12/2015 9:48 PM, rafa wrote:

Hi F21,

You can install Query Server in any server that has network connection 
with your cluster. You'll need connection with zookeeper.


Usually the Apache Phoenix Query Server is installed in the master nodes.

Accordong to the Apache Phoenix doc: 
https://phoenix.apache.org/server.html


"The server is packaged in a standalone jar, 
phoenix-server--runnable.jar. This jar and HBASE_CONF_DIR on 
the classpath are all that is required to launch the server."


you'll only need that jar and the Hbase XML config files,

Regards,
Rafa.

On Thu, Dec 17, 2015 at 11:31 AM, F21 > wrote:


I have successfully deployed phoenix and the phoenix query server
into a toy HBase cluster.

I am currently running the http query server on all regionserver,
however I think it would be much better if I can run the http
query servers on separate docker containers or machines. This way,
I can easily scale the number of query servers and put them
against a DNS name such as phoenix.mycompany.internal.

I've had a look at the configuration, but it seems to be heavily
tied to HBase. For example, it requires the HBASE_CONF_DIR
environment variable to be set.

Is this something that's currently possible?






Re: Is it possible to run the phoenix query server on a machine other than the regionservers?

2015-12-17 Thread rafa
think so. Copy the hbase-site.xml from the cluster into the new query
Server machine and set the directory where the xml resides in the classpath
of the Query Server. That should be enough,

Regards
rafa

On Thu, Dec 17, 2015 at 12:21 PM, F21  wrote:

> Hey Rafa,
>
> So in terms of the hbase-site.xml, I just need the entries for the
> location to the zookeeper quorum and the zookeeper znode for the cluster
> right?
>
> Cheers!
>
>
> On 17/12/2015 9:48 PM, rafa wrote:
>
> Hi F21,
>
> You can install Query Server in any server that has network connection
> with your cluster. You'll need connection with zookeeper.
>
> Usually the Apache Phoenix Query Server is installed in the master nodes.
>
> Accordong to the Apache Phoenix doc:
> 
> https://phoenix.apache.org/server.html
>
> "The server is packaged in a standalone jar,
> phoenix-server--runnable.jar. This jar and HBASE_CONF_DIR on the
> classpath are all that is required to launch the server."
>
> you'll only need that jar and the Hbase XML config files,
>
> Regards,
> Rafa.
>
> On Thu, Dec 17, 2015 at 11:31 AM, F21  wrote:
>
>> I have successfully deployed phoenix and the phoenix query server into a
>> toy HBase cluster.
>>
>> I am currently running the http query server on all regionserver, however
>> I think it would be much better if I can run the http query servers on
>> separate docker containers or machines. This way, I can easily scale the
>> number of query servers and put them against a DNS name such as
>> phoenix.mycompany.internal.
>>
>> I've had a look at the configuration, but it seems to be heavily tied to
>> HBase. For example, it requires the HBASE_CONF_DIR environment variable to
>> be set.
>>
>> Is this something that's currently possible?
>>
>
>
>


Java Out of Memory Errors with CsvBulkLoadTool

2015-12-17 Thread Cox, Jonathan A
I am trying to ingest a 575MB CSV file with 192,444 lines using the 
CsvBulkLoadTool MapReduce job. When running this job, I find that I have to 
boost the max Java heap space to 48GB (24GB fails with Java out of memory 
errors).

I'm concerned about scaling issues. It seems like it shouldn't require between 
24-48GB of memory to ingest a 575MB file. However, I am pretty new to 
Hadoop/HBase/Phoenix, so maybe I am off base here.

Can anybody comment on this observation?

Thanks,
Jonathan


Questions: history of deleted records, controlling timestamps

2015-12-17 Thread John Lilley
Greetings,

I've been reading about Phoenix with an eye toward implementing a "versioned 
database" on Hadoop.  It looks pretty slick, especially the ability to query at 
past timestamp.  But I can't figure out what happens with deleted records.  Are 
all versions deleted, or can I still go back in time and see the versions 
before the delete?

Also I would like to be able to make a set of changes "at the same timestamp" 
to a get a changeset-like ability similar to a VCS.  It looks like the APIs 
allow for setting of the effective timestamp for all change operations; is that 
true?

Thanks
John Lilley