I am trying to use the CsvBulkUploadTool to get data from Hive to HBase.
As I typically do, I created a Hive table w/ the copy of the data that I care
about, and with the properties:
"row format delimited fields terminated by '|' null defined as 'null' stored as
textfile location 'my location'
I have a column on a table that is set to varchar(40).
I need to increase that 40, but I don't want to lose any of the data in the
table.
The only suggestions I've seen online involve dropping the column and
re-creating it, or creating a new table. But I would like to preserve the name
of
Our cluster recently had some issue related to network outages*.
When all the dust settled, Hbase eventually "healed" itself, and almost
everything is back to working well, with a couple of exceptions.
In particular, we have one table where almost every (Phoenix) query times out -
which was
I have a table with a primary key that performs well, as well as 2 indexes,
which I created like this:
CREATE INDEX _indexed_meterkey_v2 on
_indexed_meterkey_immutable_v2 (meter_key)
( and is just some obfuscation for the purposes of
posting here)
We WERE running Phoenix 4.6, which I had
:
CAST(my_bigint as DATE)
Thanks,
James
On Tue, Apr 5, 2016 at 6:31 AM, Riesland, Zack
<zack.riesl...@sensus.com<mailto:zack.riesl...@sensus.com>> wrote:
I have ms-based, GMT timestamps in BigInt columns in one of my phoenix tables.
It’s easy to work with these in Java, but I’
I have ms-based, GMT timestamps in BigInt columns in one of my phoenix tables.
It's easy to work with these in Java, but I'm struggling to find the right
syntax to easily read them in a simple query.
For example: '1458132989477'
I know this is Wed, 16 Mar 2016 12:56:29.477 GMT
But when I do
, 2016 at 8:10 AM, Riesland, Zack <zack.riesl...@sensus.com>
wrote:
> I have a handful of VERY small phoenix tables (< 100 entries).
>
>
>
> I wrote some javascript to interact with the tables via servlet + JDBC.
>
>
>
> I can query the data almost instantaneously
I have a handful of VERY small phoenix tables (< 100 entries).
I wrote some javascript to interact with the tables via servlet + JDBC.
I can query the data almost instantaneously, but upserting is extremely slow -
on the order of tens of seconds to several minutes.
The main write operation
Hey folks,
Everything I've read online about connecting Phoenix and Tableau is at least a
year old.
Has there been any progress on an ODBC driver?
Any simple hacks to accomplish this?
Thanks!
I have a similar data pattern and 100ms response time is fairly consistent.
I’ve been trying hard to find the right set of configs to get closer to 10-20ms
with no luck, but I’m finding that 100ms average is pretty reasonable.
From: Willem Conradie [mailto:willem.conra...@pbtgroup.co.za]
Sent:
We are able to ingest MUCH larger sets of data (hundreds of GB) using the
CSVBulkLoadTool.
However, we have found it to be a huge memory hog.
We dug into the source a bit and found that
HFileOutputFormat.configureIncrementalLoad(), in using TotalOrderPartitioner
and KeyValueReducer,
o do here (if you're up for it) is recompile the
phoenix jars (at least the fat client jar) against the specific version of
HBase that you've got on your cluster. Assuming that all compiles, it should
resolve this issue.
- Gabriel
On Fri, Dec 11, 2015 at 2:01 PM, Riesland, Zack <zack.r
, 2015 at 10:41 AM, Riesland, Zack
<zack.riesl...@sensus.com<mailto:zack.riesl...@sensus.com>> wrote:
Thanks Samarth,
I’m running hbase 0.98.4.2.2.8.0-3150 and phoenix 4.6.0-HBase-0.98
The hbase stuff is there via the HDP 2.2.8 install. It worked before upgrading
to 4.6.
From:
get 3800ms for stmt.executeQuery() itself or did that time include time
spent in retrieving records via resultSet.next() too?
On Thu, Dec 10, 2015 at 7:38 AM, Riesland, Zack
<zack.riesl...@sensus.com<mailto:zack.riesl...@sensus.com>> wrote:
Thanks,
I did some experimenting.
Now, a
This morning I tried running the same operation from a data node as well as a
name node, where phoenix 4.2 is completely gone, and I get the exact same error.
From: Riesland, Zack
Sent: Tuesday, December 08, 2015 8:42 PM
To: user@phoenix.apache.org
Subject: CsvBulkUpload not working after
hbase-client jar in place?
- Samarth
On Wed, Dec 9, 2015 at 4:30 AM, Riesland, Zack
<zack.riesl...@sensus.com<mailto:zack.riesl...@sensus.com>> wrote:
This morning I tried running the same operation from a data node as well as a
name node, where phoenix 4.2 is completely gon
, at 12:20 PM, Riesland, Zack
<zack.riesl...@sensus.com<mailto:zack.riesl...@sensus.com>> wrote:
James,
2 quick followups, for whatever they’re worth:
1 – There is nothing phoenix-related in /tmp
2 – I added a ton of logging, and played with the properties a bit, and I think
I s
4, 2015 at 9:09 AM, Riesland, Zack
<zack.riesl...@sensus.com<mailto:zack.riesl...@sensus.com>> wrote:
SHORT EXPLANATION: a much higher percentage of queries to phoenix return
exceptionally slow after querying very heavily for several minutes.
LONGER EXPLANATION:
I’ve been u
lly the
conversation is helpful to the whole Phoenix community.
From: Riesland, Zack
Sent: Friday, December 04, 2015 1:36 PM
To: user@phoenix.apache.org
Cc: geoff.hai...@sensus.com
Subject: RE: Help tuning for bursts of high traffic?
Thanks, James
I'll work on gathering more information.
In the meant
I'm using Phoenix + Aqua Data Studio.
For other kinds of (jdbc) connections, I can run multiple queries:
select a, b, c from d;
select x from y;
However, Phoenix doesn't seem to like the trailing semicolon. If I have a
semicolon character at the end of a line, I get an error like this:
ERROR
you have a stack trace from the log output from when you got this error?
And could you tell me if the table name that is being complained about there is
an index table name?
Tracing through the code, it looks like you could get this exception if an
index table doesn't exist (or somehow isn't available), wh
Hello,
We recently upgraded our Hadoop stack from HDP 2.2.0 to 2.2.8
The phoenix version (phoenix-4.2.0.2.2.8.0) and HBase version (0.98.4.2.2.8.0)
did not change (from what I can tell).
However, some of our CSVBulkLoadTool jobs have started to fail.
I'm not sure whether this is related to
somewhere
(earlier) in the classpath.
I don't know anything about Aqua Data Studio, but could it be that it somehow
bundles support for HBase 0.94 somewhere (or perhaps there is another JDBC
driver on the class path that workds with HBase 0.94?)
- Gabriel
On Wed, Sep 30, 2015 at 1:37 PM, Riesland
:10 PM, Riesland, Zack <zack.riesl...@sensus.com>
wrote:
> Thanks Gabriel,
>
> I replaced all the Hadoop and hbase related jars under Aqua Data
> Studio/lib/apache with the appropriate ones from our cluster and I *think* I
> made some progress.
>
> Seems l
Hello,
Can someone tell me whether it is possible to specify a Capacity Scheduler
queue for the CSVBulkLoadTool's MapReduce job to use?
Thanks!
into your table?
- Gabriel
On Mon, Aug 31, 2015 at 3:20 PM Riesland, Zack
<zack.riesl...@sensus.com<mailto:zack.riesl...@sensus.com>> wrote:
I’m looking for some pointers on speeding up CsvBulkImport.
Here’s an example:
I took about 2 billion rows from hive and exported them to CSV.
]
Sent: Tuesday, September 01, 2015 6:43 AM
To: user@phoenix.apache.org
Subject: Re: Help Tuning CsvBulkImport MapReduce
On Tue, Sep 1, 2015 at 11:29 AM, Riesland, Zack <zack.riesl...@sensus.com>
wrote:
> You say I can find information about spills in the job counters. Are
> you t
I'm looking for some pointers on speeding up CsvBulkImport.
Here's an example:
I took about 2 billion rows from hive and exported them to CSV.
HDFS decided to translate this to 257 files, each about 1 GB.
Running the CsvBulkImport tool against this folder results in 1,835 mappers and
then 1
of manually salting it and managing
that yourself?
Thanks,
James
On Sat, Jul 25, 2015 at 4:04 AM, Riesland, Zack
zack.riesl...@sensus.commailto:zack.riesl...@sensus.com wrote:
I decided to start from scratch with my table schema in attempt to get a better
distribution across my regions/region servers
On Sat, Jul 25, 2015 at 4:04 AM, Riesland, Zack
zack.riesl...@sensus.commailto:zack.riesl...@sensus.com wrote:
I decided to start from scratch with my table schema in attempt to get a better
distribution across my regions/region servers.
So, I created a table like this:
CREATE TABLE
Subject: Re: Exception from RowCounter
PHOENIX-1248 is marked as resolved. Are you using a version of Phoenix before
this fix?
On Sun, Jul 26, 2015 at 7:22 AM, Riesland, Zack
zack.riesl...@sensus.commailto:zack.riesl...@sensus.com wrote:
Thanks James,
I am not able to use salt_buckets because
to scan over more rows than if the primary key
(A, B, C) were defined.
- Gabriel
1. http://phoenix.apache.org/skip_scan.html
On Thu, Jul 23, 2015 at 11:45 AM Riesland, Zack
zack.riesl...@sensus.commailto:zack.riesl...@sensus.com wrote:
This is probably a silly question… please humor me: I’m a Java
as it does a similar task of reading from a Phoenix table and
writes the data into the target table using bulk load.
Regards
Ravi
On Wed, Jul 22, 2015 at 6:23 AM, Riesland, Zack
zack.riesl...@sensus.commailto:zack.riesl...@sensus.com wrote:
I want to play with some options for splitting a table
the rows are in your table. On a 8 node
cluster, creating an index with 3 columns (char(15),varchar and
date) on a 1 billion row table takes about 1 hour 15 minutes.
How many rows does your table have and how wide are they?
On Wed, Jul 22, 2015 at 8:29 AM, Riesland, Zack zack.riesl...@sensus.com
I have a table like this:
CREATE TABLE fma. er_keyed_gz_meterkey_split_custid (
meter_key varchar not null,
...
sample_point integer not null,
...
endpoint_id integer,
...
CONSTRAINT pk_rma_er_keyed_filtered PRIMARY KEY (meter_key, sample_point)
)
I want to play with some options for splitting a table to test performance.
If I were to create a new table and perform an upsert select * to the table,
with billions of rows in the source table, is that like an overnight operation
or should it be pretty quick?
For reference, we have 6
This is my first time messing with a secondary index in Phoenix.
I used this syntax:
create index fma_er_keyed_gz_endpoint_id_include_sample_point on
fma.er_keyed_gz_meterkey_split_custid (endpoint_id) include (sample_point)
SALT_BUCKETS = 550;
and I get this error:
[Error Code: 1029, SQL
.
On Tue, Jul 21, 2015 at 11:39 AM, Riesland, Zack
zack.riesl...@sensus.commailto:zack.riesl...@sensus.com wrote:
This is my first time messing with a secondary index in Phoenix.
I used this syntax:
create index fma_er_keyed_gz_endpoint_id_include_sample_point on
fma.er_keyed_gz_meterkey_split_custid
If the counts are, indeed, different, then the next question is: how are you
getting data from hive to phoenix?
From: anil gupta [mailto:anilgupt...@gmail.com]
Sent: Tuesday, July 14, 2015 3:48 AM
To: user@phoenix.apache.org
Subject: Re: Phoenix vs Hive
You can do major compaction via Hbase
This is probably a lame question, but can anyone point me in the right
direction for CHANGING and EXISTING primary key on a table?
I want to add a column.
Is it possible to do that without dropping the table?
Thanks!
, Riesland, Zack
zack.riesl...@sensus.commailto:zack.riesl...@sensus.com wrote:
Thanks James,
To clarify: the column already exists on the table, but I want to add it to the
primary key.
Is that what your example accomplishes?
From: James Taylor
[mailto:jamestay...@apache.orgmailto:jamestay
table
ALTER TABLE t ADD my_new_col VARCHAR PRIMARY KEY
The new column must be nullable and the last existing PK column cannot be
nullable and fixed width (or varbinary or array).
On Tue, Jul 14, 2015 at 10:01 AM, Riesland, Zack
zack.riesl...@sensus.commailto:zack.riesl...@sensus.com wrote
, Riesland, Zack
zack.riesl...@sensus.commailto:zack.riesl...@sensus.com wrote:
Thanks James,
That’s what I thought.
If I were to make a NEW table with the same columns, is there a simple way to
copy the data from the old table to the new one?
From: James Taylor
[mailto:jamestay
).
Assuming that at least one of these works for you (or even if they don't),
could you add a ticket in the Phoenix JIRA
(https://issues.apache.org/jira/browse/PHOENIX) so that we can track getting a
more structural fix for this issue?
- Gabriel
On Tue, Jul 7, 2015 at 4:53 PM Riesland, Zack
?
From: Krishna [mailto:research...@gmail.com]
Sent: Monday, July 06, 2015 3:11 PM
To: user@phoenix.apache.org
Subject: Re: Permissions Question
The owner of the directory containing HFiles should be 'hbase' user and
ownership can set using 'chown' command.
On Mon, Jul 6, 2015 at 7:12 AM, Riesland
Hello,
I'm attempting to use the CsvBulkLoader tool from a new edge node.
This edge node is not a data node or region server node on our cluster.
It is intended to be used for running scripts and interacting with the cluster
nodes.
I manually installed all the phoenix files (I copied
running on the localhost and/or isn't configured
in the local configuration (see http://phoenix.apache.org/bulk_dataload.html).
- Gabriel
On Mon, Jul 6, 2015 at 12:08 PM Riesland, Zack
zack.riesl...@sensus.commailto:zack.riesl...@sensus.com wrote:
Hello,
I’m attempting to use the CsvBulkLoader
I've been running CsvBulkLoader as 'hbase' and that has worked well.
But I now need to integrate with some scripts that will be run as another user.
When I run under a different account, the CsvBulkLoader runs and creates the
HFiles, but then encounters permission issues attempting to write the
After using the CsvBulkLoader successfully for a few days, I’m getting some
strange behavior this morning.
I ran the job on a fairly small ingest of data (around 1/2 billion rows).
It seemed to complete successfully. I see this in the logs:
Phoenix MapReduce Import
After some investigation, I think this is a permissions issue.
If I run as ‘hbase’, this works consistently.
FYI
From: Riesland, Zack
Sent: Wednesday, July 01, 2015 7:25 AM
To: user@phoenix.apache.org
Subject: Help interpretting CsvBulkLoader issues?
After using the CsvBulkLoader successfully
.
On Friday, June 26, 2015, Riesland, Zack
zack.riesl...@sensus.comjavascript:_e(%7B%7D,'cvml','zack.riesl...@sensus.com');
wrote:
I wrote a Java program that runs nightly and collects metrics about our hive
tables.
I would like to include HBase tables in this as well.
Since select count(*) is slow
through the
connection to 60 milliseconds (10mins).
You can also set the phoenix.query.timeoutMs in your client-side
hbase-sites.xml and it'll be used for the query timeout for all connections.
Thanks,
James
On Mon, Jun 29, 2015 at 2:44 AM, Riesland, Zack zack.riesl...@sensus.com
wrote
-sites.xml and it'll be used for the query timeout for all connections.
Thanks,
James
On Mon, Jun 29, 2015 at 2:44 AM, Riesland, Zack zack.riesl...@sensus.com
wrote:
Thanks, James!
Can you point me to some instructions or some syntax for setting those
timeout values in Java code?
I’ve
I wrote a Java program that runs nightly and collects metrics about our hive
tables.
I would like to include HBase tables in this as well.
Since select count(*) is slow and not recommended on Phoenix, what are my
alternatives from Java?
Is there a way to call
jobs, but any kind of
job), as then if there is any kind of cleanup or something similar done in the
driver program, it'll still get run even if the ssh session gets dropped.
- Gabriel
On Thu, Jun 25, 2015 at 8:47 PM Riesland, Zack
zack.riesl...@sensus.commailto:zack.riesl...@sensus.com wrote
On Thu, Jun 25, 2015 at 3:11 AM, Riesland, Zack
zack.riesl...@sensus.commailto:zack.riesl...@sensus.com wrote:
Earlier this week I was surprised to find that, after dumping tons of data from
a Hive table to an HBase table, about half of the data didn’t end up in HBase.
So, yesterday, I created a new
drop your ssh connection.
- Gabriel
On Thu, Jun 25, 2015 at 8:27 PM Riesland, Zack
zack.riesl...@sensus.commailto:zack.riesl...@sensus.com wrote:
Thanks Gabriel,
Then perhaps I discovered something interesting.
After my last email, I created a new table with the exact same script, except I
be around the same value as
well.
Could you post the values that you've got on those counters?
- Gabriel
On Thu, Jun 25, 2015 at 4:41 PM Riesland, Zack
zack.riesl...@sensus.commailto:zack.riesl...@sensus.com wrote:
I started writing a long response, and then noticed something:
When I created my
is not sufficient. You have to increase
HBase RPC timeout as well - hbase.rpc.timeout.
3. Upgrading to HBase 1.1 will resolve your timeout issues (it has support for
long running scanners), but this is probably not the option?
-Vlad
On Tue, Jun 23, 2015 at 6:19 AM, Riesland, Zack
zack.riesl
returns immediately isn't necessarily a bad thing (as long as you're not
getting an error from it).
- Gabriel
On Wed, Jun 24, 2015 at 12:14 PM Riesland, Zack
zack.riesl...@sensus.commailto:zack.riesl...@sensus.com wrote:
Quick update: I found that I am able to execute ‘update statistics’ on other
Bytes Written=702177539
From: Riesland, Zack
Sent: Tuesday, June 23, 2015 9:20 AM
To: 'user@phoenix.apache.org'
Subject: RE: How To Count Rows In Large Phoenix Table?
Anil: Thanks for the tip about mapreduce.RowCounter. That takes about 70
minutes, but it works!
Unfortunately, I only got about 60
hopefully
result in nearly the same thing (in terms of querying) as if you were to split
the regions.
- Gabriel
On Tue, Jun 23, 2015 at 7:56 PM Riesland, Zack
zack.riesl...@sensus.commailto:zack.riesl...@sensus.com wrote:
Thanks Gabriel,
That’s all very helpful.
I’m not at all sure
I had a very large Hive table that I needed in HBase.
After asking around, I came to the conclusion that my best bet was to:
1 - export the hive table to a CSV 'file'/folder on the HDFS
2 - Use the org.apache.phoenix.mapreduce.CsvBulkLoadTool to import the data.
I found that if I tried to pass
, Riesland, Zack zack.riesl...@sensus.com
wrote:
Whenever I run a non-typical query (not filtered by the primary key),
I get an exception like this one.
I tried modifying each of the following in custom hbase-site to
increase the
timeout:
Hbase.client.scanner.timeout.period
@phoenix.apache.org
Subject: RE: How to change region size limit
It totally depends on the type of Query you would be running.
If its point query then it makes sense else aggregates and top N queries might
run slow. More load on the client for deriving final result.
From: Riesland, Zack [mailto:zack.riesl
At the Hadoop Summit last week, some guys from Yahoo presented on why it is
wise to keep region size fairly small and region count fairly large.
I am looking at my HBase config, but there are a lot of numbers that look like
they're related to region size.
What parameter limits the data size of
I'm new to Hbase and to Phoenix.
I needed to build a GUI off of a huge data set from HDFS, so I decided to
create a couple of Phoenix tables, dump the data using the CSV bulk load tool,
and serve the GUI from there.
This all 'works', but as the data set grows, I would like to improve my table
on table splitting
Can you provide the Queries which you would be running on your table?
Also use the MR Bulkload instead of using the CSV load tool.
From: Riesland, Zack [mailto:zack.riesl...@sensus.com]
Sent: Monday, June 15, 2015 4:03 PM
To: user@phoenix.apache.orgmailto:user@phoenix.apache.org
Whenever I run a non-typical query (not filtered by the primary key), I get an
exception like this one.
I tried modifying each of the following in custom hbase-site to increase the
timeout:
Hbase.client.scanner.timeout.period
Hbase.regionserver.lease.period
Hbase.rpc.shortoperation.timeout
69 matches
Mail list logo